These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.

Concepts covered:

Groupby with group_by() and summarize()
Aggregation (mean, distinct count)
Descriptive statistics by category

Load the data

Code

np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv", stringsAsFactors = FALSE)

Load dplyr library

Code

library("dplyr")

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code

# Your code here

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code

# Your code here

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code

# Your code here

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

DPLYR Groupby with National Park Visitation Data (Exercise)

Code Links

Load the data

Load dplyr library

Exercise 1

Exercise 2

Exercise 3: