Motivation

The history of your name

You can use the data in babynames to make graphs like this, which reveal the history of a name, perhaps your name.

But before you do, you will need to trim down babynames. At the moment, there are more rows in babynames than you need to build your plot.

An example

To see what I mean, consider how I made the plot above: I began with the entire dataset, which if plotted as a scatterplot would’ve looked like this.

ggplot(babynames) +
  geom_point(aes(x = year, y = prop)) +
  labs(title = "Popularity of every name in the dataset")

I then narrowed the data to just the rows that contain my name, before plotting the data with a line geom. Here’s how the rows with just my name look as a scatterplot.

babynames |> 
  filter(name == "Kyle", sex == "M") |> 
  ggplot() +
    geom_point(aes(x = year, y = prop)) +
    labs(title = "Popularity of the name Kyle")

If I had skipped this step, my line graph would’ve connected all of the points in the large dataset, creating an uninformative graph.

ggplot(babynames) +
  geom_line(aes(x = year, y = prop)) +
  labs(title = "Popularity of every name in the dataset")

Your goal in this section is to repeat this process for your own name (or a name that you choose). Along the way, you will learn a set of functions that isolate information within a dataset.

Isolating data

This type of task occurs often in data science: you need to extract data from a table before you can use it. You can do this task quickly with three functions that come in the {dplyr} package:

  1. select(), which extracts columns from a data frame
  2. filter(), which extracts rows from a data frame
  3. arrange(), which moves important rows to the top of a data frame

Each function takes a data frame or tibble as its first argument and returns a new data frame or tibble as its output.

Next Topic