DPLYR Groupby with National Park Visitation Data (Solution)

dplyr
exercise
solution
Published

August 1, 2024

Load the data

Code
np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/Neat-Datasets/main/1979-2020-National-Park-Visits-By-State.csv", stringsAsFactors = FALSE)

Load dplyr library

Code
library("dplyr")

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code
avg_state_visits <- np_data %>% 
                    group_by(State) %>%
                    summarize(avg_visits = mean(RecreationVisits))

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code
avg_park_visits <- np_data %>% 
                    group_by(ParkName, State) %>%
                    summarize(avg_visits = mean(RecreationVisits))
`summarise()` has grouped output by 'ParkName'. You can override using the
`.groups` argument.

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code
distinct_parks <-   np_data %>% 
                    group_by(State) %>%
                    summarize(num_parks = n_distinct(ParkName))

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?