Load the data

Code

np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/Neat-Datasets/main/1979-2020-National-Park-Visits-By-State.csv", stringsAsFactors = FALSE)

Load dplyr library

Code

library("dplyr")


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code

avg_state_visits <- np_data %>% 
                    group_by(State) %>%
                    summarize(avg_visits = mean(RecreationVisits))

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code

avg_park_visits <- np_data %>% 
                    group_by(ParkName, State) %>%
                    summarize(avg_visits = mean(RecreationVisits))

`summarise()` has grouped output by 'ParkName'. You can override using the
`.groups` argument.

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code

distinct_parks <-   np_data %>% 
                    group_by(State) %>%
                    summarize(num_parks = n_distinct(ParkName))

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

DPLYR Groupby with National Park Visitation Data (Solution)

Code Links

Load the data

Load dplyr library

Exercise 1

Exercise 2

Exercise 3: