Primer I: Quick Start

This primer will lead you through the steps to make a simple plot (maybe your first!) with R, a free and open-source software environment for statistical computing and graphics. You can read more about it here https://www.r-project.org.

Our end goal is to get you looking at a screen like this:

final screenshot

Packages

The version of R that you just downloaded is considered base R, which provides you with good but basic statistical computing and graphics powers. For analytical and graphical super-powers, you’ll need to install add-on packages, which are user-written, to extend/expand your R capabilities.

R packages can live in one of two places:

  • They may be carefully curated by CRAN (https://cran.r-project.org/). CRAN stands for the “Comprehensive R Archive Network”, which involves a thorough submission and review process for R packages. Packages that go through this process and are accepted by CRAN are easy to install from your R console using:

    ```{r}
    install.packages("name_of_package")
    ```

    In your R console, let’s install the remotes package:

    ```{r}
    install.packages("remotes")
    ```
  • Alternatively, they may be available via GitHub. Sometimes, a package is available in both places. Many R packages use GitHub for development, and submit new versions occasionally to CRAN. CRAN packages are often considered to be ‘stable’ versions. Packages from GitHub are considered to be the in-development versions.

    To download an R package from GitHub, you first need to install the remotes package from CRAN, which we just did! Then you can run this code in your R console:

    ```{r}
    library(remotes)                   # load the remotes package
    install_github("username/repo")    # use a function from remotes
    ```

    The install_github() statement is a function, and the part in quotes ("username/repo") refers to URL for the R package on GitHub. For example, we made a package for this book that you can find at https://github.com/dspatterns/dspatterns. To install that package, run the following code in your R console:

    ```{r}
    library(remotes)                   
    install_github("dspatterns/dspatterns")    
    ```

Place your cursor in the console again (where you last typed x and [4] printed on the screen). You can use the first method to install the following packages directly from CRAN, all of which we will use:

Mind your use of quotes carefully with packages.

  • To install a package, you put the name of the package in quotes as in install.packages("name_of_package").
  • To use an already installed package, you must load it first, as in library(name_of_package). You only need to do this once per RStudio session.

Two good rules of thumb when working with packages in R:

  1. Install packages once per workstation/machine. Always use the console when using install.packages().

  2. Load packages once per work session. Each library() call typically goes on its own line when we put our code in an R script or Quarto document.

You can download all of these at once, too:

install.packages(c("dplyr", "ggplot2", "babynames"), dependencies = TRUE)
Heads up

We should formally introduce the combine command, c(), used above. You will use this often- any time you want to combine things into a vector.

c("hello", "my", "name", "is", "alison")
[1] "hello"  "my"     "name"   "is"     "alison"
c(1:3, 20, 50)
[1]  1  2  3 20 50
Heads up

R is case-sensitive, so ?dplyr works but ?Dplyr will not. Likewise, a variable called A is different from a.

Open a new R script in RStudio by going to File --> New File --> R Script. For this first foray into R, we’ll give you the code, so sit back and relax—feel free to copy and paste our code in with some small tweaks.

First we’ll load the packages:

```{r}
library(babynames)  # contains the actual data
library(dplyr)      # for manipulating data
library(ggplot2)    # for plotting data
```

and in the next section, we’ll begin using some functions.

Functions

Here are some critical commands to obtain a high-level overview of your freshly read dataset in R. We’ll call it saying ‘hello’ to your dataset:

glimpse(babynames)
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,…
$ sex  <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",…
$ n    <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016…
head(babynames)
# A tibble: 6 × 5
   year sex   name          n   prop
  <dbl> <chr> <chr>     <int>  <dbl>
1  1880 F     Mary       7065 0.0724
2  1880 F     Anna       2604 0.0267
3  1880 F     Emma       2003 0.0205
4  1880 F     Elizabeth  1939 0.0199
5  1880 F     Minnie     1746 0.0179
6  1880 F     Margaret   1578 0.0162
tail(babynames)
# A tibble: 6 × 5
   year sex   name       n       prop
  <dbl> <chr> <chr>  <int>      <dbl>
1  2017 M     Zyhier     5 0.00000255
2  2017 M     Zykai      5 0.00000255
3  2017 M     Zykeem     5 0.00000255
4  2017 M     Zylin      5 0.00000255
5  2017 M     Zylis      5 0.00000255
6  2017 M     Zyrie      5 0.00000255
names(babynames)
[1] "year" "sex"  "name" "n"    "prop"

If you have done the above and produced sane-looking output, you are ready for the next step. We’ll use the code below to create a new data frame called alison.

alison <- 
  babynames |>
  filter(name == "Alison" | name == "Allison") |>
  filter(sex == "F") 
  • The first bit makes a new dataset called alison that is a copy of the babynames dataset- the |> tells you we are doing some other stuff to it later.

  • The second bit filters our babynames to only keep rows where the name is equal to “Alison” (read == as “exactly equal to”) or “Allison” (read | as “or”.)

  • The third bit applies another filter to keep only those where sex is female.

Let’s check out the data (in two different ways):

alison
# A tibble: 218 × 5
    year sex   name        n      prop
   <dbl> <chr> <chr>   <int>     <dbl>
 1  1905 F     Alison      7 0.0000226
 2  1907 F     Alison      5 0.0000148
 3  1908 F     Allison     6 0.0000169
 4  1910 F     Alison      5 0.0000119
 5  1910 F     Allison     5 0.0000119
 6  1911 F     Allison     9 0.0000204
 7  1912 F     Allison    12 0.0000204
 8  1912 F     Alison      9 0.0000153
 9  1913 F     Alison     12 0.0000183
10  1913 F     Allison     7 0.0000107
# ℹ 208 more rows
glimpse(alison)
Rows: 218
Columns: 5
$ year <dbl> 1905, 1907, 1908, 1910, 1910, 1911, 1912, 1912, 1913, 1913, 1914,…
$ sex  <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Alison", "Alison", "Allison", "Alison", "Allison", "Allison", "A…
$ n    <int> 7, 5, 6, 5, 5, 9, 12, 9, 12, 7, 22, 11, 16, 13, 24, 15, 20, 15, 3…
$ prop <dbl> 2.259e-05, 1.482e-05, 1.692e-05, 1.192e-05, 1.192e-05, 2.037e-05,…

Again, if you have proper-looking output here, move along to plotting the data.

plot <- 
  ggplot(alison, aes(x = year, y = prop, group = name, color = name)) + 
  geom_line()  

Now if you did this right, you will not see your plot! Because we saved the plot with a name (plot), R just saved the object for you. But check out the top right pane in RStudio again: under Values you should see plot, so it is there, you just have to ask for it. Here’s how:

plot 

DIY: Make a New Name Plot

Edit the code above to create a new dataset. Pick two names to compare how popular they each are (these could be different spellings of your own name, like I did, but you can choose any two names that are present in the dataset). Make the new plot, changing the name of the first argument alison in ggplot() to the name of your new dataset.

Save and share

Save your plotting work so you can share your favorite plot with others! You could use the exporting functionality the plot window but you will most likely not like the looks of it. Instead, use ggplot’s ggsave() function for saving a plot with sensible defaults:

```{r}
ggsave("alison_hill.pdf", plot)  # Choose your own filename, keep the .pdf part 
```

Some extra tips are in order:

  1. Specify the output format.

    It’s important to specify the desired output format by providing the appropriate file extension in the filename. For example, if you want to save the plot as a PNG image, use a filename like "plot.png". Similarly, for PDF, use "plot.pdf". So always specify the format explicitly; you’ll then be guaranteed to get the plot saved to the right format.

  2. Adjust the plot dimensions.

    It may be necessary to adjust the dimensions to fit the intended output medium. For example, if you plan to include the plot in a report or presentation slide, you might want to adjust the width and height of the plot to match the desired layout. This can be done using the width and height arguments.

  3. Set the resolution.

    When saving the plot as an image file, it’s sometimes important to consider the resolution (dots per inch or dpi) of of that image. Using a higher resolution gives you a more detailed and sharper image. The default resolution in R is usually 72 dpi. However, for better quality, especially if you plan to print the plot or display it in high-resolution surroundings, you can increase the dpi value by using the dpi argument in ggsave().

Woot! You did it. Plotting with ggplot is fun and the resulting plots do look great.