Example Data

Loading babynames

Before we begin, let’s learn a little about our data. The babynames dataset comes in the {babynames} package. The package is pre-installed for you, just as {ggplot2} was pre-installed in the last tutorial. But unlike in the last tutorial, I have not pre-loaded {babynames}, or any other package.

What does this mean? In R, whenever you want to use a package that is not part of base R, you need to load the package with the command library(). Until you load a package, R will not be able to find the datasets and functions contained in the package. For example, if we asked R to display the babynames dataset, which comes in the {babynames} package, right now, we’d get the message below. R cannot find the dataset because we haven’t loaded the {babynames} package.

babynames
Error in eval(expr, envir, enclos): object 'babynames' not found

To load the {babynames} package, you would run the command library(babynames). After you load a package, R will be able to find its contents until you close R. The next time you open R, you will need to reload the package if you wish to use it again.

This might sound like an inconvenience, but choosing which packages to load keeps your R experience simple and orderly.

In the chunk below, load {babynames} (the package) and then open the help page for babynames (the dataset). Be sure to read the help page before going on.

library(babynames)
?babynames

The data

Now that you know a little about the dataset, let’s examine its contents. If you were to run babynames at your R console, you would get output that looks like this:

babynames

#> 187     1880   F       Christina    65 6.659495e-04
#> 188     1880   F           Lelia    65 6.659495e-04
#> 189     1880   F           Nelle    65 6.659495e-04
#> 190     1880   F             Sue    65 6.659495e-04
#> 191     1880   F         Johanna    64 6.557041e-04
#> 192     1880   F           Lilly    64 6.557041e-04
#> 193     1880   F         Lucinda    63 6.454587e-04
#> 194     1880   F         Minerva    63 6.454587e-04
#> 195     1880   F          Lettie    62 6.352134e-04
#> 196     1880   F           Roxie    62 6.352134e-04
#> 197     1880   F         Cynthia    61 6.249680e-04
#> 198     1880   F          Helena    60 6.147226e-04
#> 199     1880   F           Hilda    60 6.147226e-04
#> 200     1880   F           Hulda    60 6.147226e-04
#>  [ reached getOption("max.print") -- omitted 1825233 rows ]

Yikes. What is happening?

Displaying large data

babynames is a large data frame, and R is not well equipped to display the contents of large data frames. R shows as many rows as possible before your memory buffer is overwhelmed. At that point, R stops, leaving you to look at an arbitrary section of your data.

You can avoid this behavior by transforming your data frame to a tibble.

Next Topic