A Appendix A: Vocabulary

You will be responsible for knowing the following functions and vocabulary for the weekly quizzes.

A.1 Quiz 1—R Preliminaries

  • Grading policies for the course
  • Course requirements / policies for in-class quizzes
  • Free and open source software
  • “Free as in beer” versus “free as in speech”
  • Advantages and disadvantages of interpreted languages compared to “Point-and-click” software
  • Advantages and of interpreted languages compared to compiled languages and assembly languages
  • Difference between R and RStudio
  • R session
  • R console
  • Function arguments (including required versus optional arguments)
  • Accessing a function’s helpfile using ?
  • Mathematical operators: +, -, *, -
  • R objects and object names
  • “gets arrow”: <-
  • Rules and style guidelines for naming objects
  • ls()
  • R scripts
  • # comment character
  • R packages
  • CRAN
  • Installing packages
  • install.packages()
  • Loading a package
  • library()
  • Types of package documentation: vignettes and helpfiles
  • vignette(), option package =
  • Vectors
  • c()
  • Two of the basic classes of vectors: character and numeric
  • class()
  • Square bracket indexing for vectors: [...]
  • Dataframes
  • tibble() function from the dplyr package
  • select and slice functions to extract values from dataframes
  • read_csv, option skip =
  • str()
  • summary()
  • dim()
  • ncol()
  • nrow()
  • Nate Silver
  • FiveThirtyEight
  • NA for missing values
  • $ to get a column from a dataframe
  • paste(), option sep =
  • paste0()
  • = vs. <- for assignment expressions
  • package::function() notation

A.2 Quiz 2—Entering / cleaning data #1

  • What kinds of data can be read into R?
  • delimited files (csv, tsv)
  • fixed width files
  • delimiter
  • read_delim, options delim =, skip =, n_max =, col_names =
  • read_fwf
  • read_csv, options skip =, n_max =, col_names =
  • readxl package and its read_excel() function
  • haven package and its read_sas() function
  • NA
  • Computer directory structure
  • working directory
  • getwd()
  • list.files()
  • relative pathnames
  • absolute pathnames
  • shorthand for pathnames: ., .., ../data, etc.
  • Reading in data from either a local or online flat file
  • paste(), option sep =
  • paste0()
  • How to read flat files of data that are online directly into R
  • dplyr package
  • rename()
  • Why you might want to rename column names (e.g., uppercase, long, unusual characters)
  • select()
  • slice()
  • mutate()
  • filter()
  • arrange(), including with desc()
  • %>%, advantages of piping

A.3 Quiz 3

  • Main types of vector classes in R: character, numeric, factor, date, logical
  • lubridate functions, including ymd, ymd_hm, mdy, wday, and mday
  • Common logical operators in R (==, !=, %in%, is.na(), &, |)
  • data() (with and without the name of a dataset as an option)
  • library() (with and without an argument in the parentheses)
  • logical vectors, including running sum on a logical vector
  • What the bang operator (!) does to a logical operator
  • The tidyverse
  • min()
  • max()
  • mean()
  • median()
  • summarize()
  • Special functions to use with summarize(): n(), n_distinct(), first(), last()
  • Using group_by() before using summarize()
  • The three basic elements of a ggplot plot: data, aesthetics, and geoms
  • aes function and common aesthetics, including color, shape, x, y, alpha, size, and fill
  • Mapping an aesthetic to a column in the data versus setting it to a constant value
  • Some common geoms: geom_histogram, geom_points, geom_lines, geom_boxplot()
  • The difference between “statistical” geoms (e.g., geom_boxplot, geom_smooth) and “non-statistical” (e.g., geom_point, geom_line)
  • Common additions to ggplot objects: ggtitle, labs, xlim, ylim, expand_limits

A.4 Quiz 4

  • Guidelines for good graphics
  • Data density / data-to-ink ratio
  • Small multiples
  • Edward Tufte
  • Hadley Wickham
  • Where to put the + in ggplot statements to avoid problems (ends of lines instead of starts of new lines)
  • Can you save a ggplot object as an R object that you can reference later? If so, how would you add elements on to that object? How would you print it when you were ready to print the graph to your RStudio graphics window?
  • geom_hline(), geom_vline()
  • geom_text()
  • facet_grid(), facet_wrap()
  • grid.arrange() from the gridExtra package
  • ggthemes package, including theme_few() and theme_tufte()
  • Setting point color for geom_point() both as a constant (all points red) and as a way to show the level of a factor for each observation
  • size, alpha, color
  • Re-naming and re-ordering factors
  • Note: If you read this and find and bring in an example of a “small multiples” graph (from a newspaper, a website, an academic paper), you can get one extra point on this quiz

A.5 Quiz 5

  • Reproducible research, including what it is and advantages to aiming to make your research reproducible
  • R style guidelines on variable names, <- vs. =, line length, spacing, semicolons, commenting, indentation, and code grouping
  • Markup languages (concept and examples)
  • Basic conventions for Markdown (bold, italics, links, headers, lists)
  • Literate programming
  • What working directory R uses for code in an .Rmd document
  • Basic syntax for RMarkdown chunks, including how to name them
  • Options for RMarkdown chunks: echo, eval, messages, warnings, include, fig.width, fig.height, results
  • Difference between global options and chunk options, and which takes precendence
  • What inline code is and how to write it in RMarkdown
  • How to set global options
  • Why style is important in coding
  • RPubs

A.6 Quiz 6

  • Three characteristics of tidy data
  • Five common problems with tidy data and how to resolve them (make sure you understand the examples shown, which you can find out more about in the Hadley Wickham paper I reference in the slides)
  • group_by with mutate, slice, and arrange
  • separate and unite
  • pivot_longer and pivot_wider
  • The *_join family of functions (left_join, right_join, inner_join, full_join, anti_join, semi_join)

A.7 Quiz 7

  • regular expressions
  • difference in the output between str_extract and str_detect
  • str_to_lower, str_to_upper, str_to_title, str_trim
  • lists
  • indexing from lists ([[ and $)
  • exploring lists (class, names, str functions)
  • exploring lists with jsonedit from the listviewer package
  • tidy function from broom applied to the output from a statistical test function
  • forcats package, including fct_recode, fct_infreq, fct_reorder, and fct_lump
  • Bioconductor
  • “accessor” functions for Bioconductor (e.g., get_variable)
  • select with starts_with, select_if

A.8 Quiz 8

  • Review of previous weeks (especially tidyverse packages: dplyr, tidyr, ggplot2, stringr, and forcats)
  • Writing a formula with ~ syntax
  • Regression modeling with lm, glm
  • Using functions from broom to tidy model output (augment, tidy, glance)

A.9 Quiz 9

  • Nesting with group_by and nest and unnesting with unnest
  • A list-column in a dataframe
  • Review of previous quizzes