Pandas Groupby with National Park Visitation Data (Solution)

pandas
exercise
solution
Published

August 1, 2024

Load the data

Code
import pandas as pd

np_data = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/Neat-Datasets/main/1979-2020-National-Park-Visits-By-State.csv")

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code
avg_state_visits = np_data.groupby('State')['RecreationVisits'].mean().reset_index()
avg_state_visits
State RecreationVisits
0 AK 1.377887e+05
1 AR 1.336286e+06
2 AS 1.627135e+04
3 AZ 1.840670e+06
4 CA 9.730535e+05
5 CO 1.029783e+06
6 FL 4.722664e+05
7 HI 1.307400e+06
8 IN 1.776658e+06
9 KY 1.342024e+06
10 ME 2.909380e+06
11 MI 1.965193e+04
12 MN 2.218540e+05
13 MO 2.465409e+06
14 MT 2.006450e+06
15 ND 5.174973e+05
16 NM 5.423827e+05
17 NV 8.123107e+04
18 OH 2.153601e+06
19 OR 4.796753e+05
20 SC 8.927681e+04
21 SD 8.144197e+05
22 TN 9.506039e+06
23 TX 2.421379e+05
24 UT 1.090876e+06
25 VA 1.518425e+06
26 VI 5.735757e+05
27 WA 1.498398e+06
28 WV 9.854926e+05
29 WY 2.719573e+06

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code
avg_park_visits = np_data.groupby('ParkName')['RecreationVisits'].mean().reset_index()
avg_park_visits
ParkName RecreationVisits
0 Acadia NP 2.909380e+06
1 Arches NP 8.315601e+05
2 Badlands NP 1.016205e+06
3 Big Bend NP 3.064676e+05
4 Biscayne NP 4.485172e+05
... ... ...
58 Wind Cave NP 6.126346e+05
59 Wrangell-St. Elias NP & PRES 4.733967e+04
60 Yellowstone NP 3.017686e+06
61 Yosemite NP 3.448971e+06
62 Zion NP 2.494319e+06

63 rows × 2 columns

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code
distinct_parks = np_data.groupby('State')['ParkName'].nunique().reset_index(name='NumParks')
distinct_parks
State NumParks
0 AK 8
1 AR 1
2 AS 1
3 AZ 3
4 CA 9
5 CO 4
6 FL 3
7 HI 2
8 IN 1
9 KY 1
10 ME 1
11 MI 1
12 MN 1
13 MO 1
14 MT 1
15 ND 1
16 NM 2
17 NV 1
18 OH 1
19 OR 1
20 SC 1
21 SD 2
22 TN 1
23 TX 2
24 UT 5
25 VA 1
26 VI 1
27 WA 3
28 WV 1
29 WY 2

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?