Code
<- read.csv("https://raw.githubusercontent.com/melaniewalsh/Neat-Datasets/main/1979-2022-National-Park-Visits-By-State.csv",
np_data stringsAsFactors = FALSE)
February 26, 2024
View the np_data dataframe by clicking on the spreadsheet icon in the Global Environment
First, filter the dataframe for a park of your choice. Then, pick a National Park that you haven’t worked with yet, and filter the data for only that park.
ParkName | Region | State | Year | RecreationVisits |
---|---|---|---|---|
Mount Rainier NP | Pacific West | WA | 1979 | 1516703 |
Mount Rainier NP | Pacific West | WA | 1980 | 1268256 |
Mount Rainier NP | Pacific West | WA | 1981 | 1233671 |
Mount Rainier NP | Pacific West | WA | 1982 | 1007300 |
Mount Rainier NP | Pacific West | WA | 1983 | 1106306 |
Mount Rainier NP | Pacific West | WA | 1984 | 1152411 |
Now, make a line plot that shows the number of visits per year to that park from 1979 to 2022.
Choose a color for the line.
Give the plot a title that also functions as a kind of “headline” for the most interesting story of the plot.
Change the x-axis ticks so that they increase 5 years at a time.
Change the y-axis tick labels so that they abbreviate millions to M and thousands to K.
ggplot(my_parks_df) +
geom_line(aes(
x = Year,
y = RecreationVisits
),
color = "green") +
scale_x_continuous(
breaks = seq(from = 1980, to = 2020, by = 5),
) +
scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()),
limits = c(0, 2000000)) +
labs(title = "Visits to Mt. Rainier Are Surprisingly Stable")
Now, create a plot that zooms in on the most interesting time period for this particular National Park.
Change the x-axis limits so that it only shows the most interesting years.
Come up with a new title that describes this time period.
ggplot(my_parks_df) +
geom_line(aes(
x = Year,
y = RecreationVisits
),
color = "green") +
scale_x_continuous(
breaks = seq(from = 1980, to = 2020, by = 5),
limits = c(2005, 2023),
) +
scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()),
limits = c(0, 2000000)) +
labs(title = "After a COVID Dip, Mt. Rainier Visits Are Higher Than Ever")
---
title: "ggplot Customization with National Park Visitation Data (Solution)"
date: "2024-02-26"
categories: [ggplot, advanced, solution]
#image: "https://upload.wikimedia.org/wikipedia/commons/e/ed/US-Mexico_barrier_map.png"
format:
html:
code-links:
- text: R Script
href: Ggplot-Customization-NP-Solutions.R
icon: file-code
code-overflow: wrap
code-fold: show
editor: visual
df-print: kable
R.options:
warn: false
code-tools: true
execute:
eval: true
message: false
warning: false
---
# <span style="color:red;"> Solution </span>
<a href="Ggplot-Customization-NP-Solutions.R" download="Ggplot-Customization-NP-Solutions.R">Download as R Script</a>
<span style="color:green;"> [Exercise Without Solutions](Intro-to-DPLYR-NP.qmd) </span>
## Load National Park Visitation data
```{r}
np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/Neat-Datasets/main/1979-2022-National-Park-Visits-By-State.csv",
stringsAsFactors = FALSE)
```
View the np_data dataframe by clicking on the spreadsheet icon in the Global Environment
## Load libraries
```{r}
#| message: false
library("dplyr")
library("stringr")
library("ggplot2")
library("scales")
```
- How have visits to a particular National Park changed over time?
- What is the most interesting period of change?
# Exercise 1
First, filter the dataframe for a park of your choice. Then, pick a National Park that you haven't worked with yet, and filter the data for only that park.
```{r}
my_parks_df <- np_data %>%
filter(ParkName == "Mount Rainier NP")
head(my_parks_df)
```
# Exercise 2
Now, make a line plot that shows the number of visits per year to that park from 1979 to 2022.
### 2a.
Choose a color for the line.
### 2b.
Give the plot a title that also functions as a kind of "headline" for the most interesting story of the plot.
### 2c.
Change the x-axis ticks so that they increase 5 years at a time.
### 2d.
Change the y-axis tick labels so that they abbreviate millions to M and thousands to K.
```{r}
ggplot(my_parks_df) +
geom_line(aes(
x = Year,
y = RecreationVisits
),
color = "green") +
scale_x_continuous(
breaks = seq(from = 1980, to = 2020, by = 5),
) +
scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()),
limits = c(0, 2000000)) +
labs(title = "Visits to Mt. Rainier Are Surprisingly Stable")
```
## Exercise 3
Now, create a plot that zooms in on the most interesting time period for this particular National Park.
### 3a.
Change the x-axis limits so that it only shows the most interesting years.
### 3b.
Come up with a new title that describes this time period.
```{r}
ggplot(my_parks_df) +
geom_line(aes(
x = Year,
y = RecreationVisits
),
color = "green") +
scale_x_continuous(
breaks = seq(from = 1980, to = 2020, by = 5),
limits = c(2005, 2023),
) +
scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()),
limits = c(0, 2000000)) +
labs(title = "After a COVID Dip, Mt. Rainier Visits Are Higher Than Ever")
```