A Appendix A: Vocabulary

You will be responsible for knowing the following functions and vocabulary for the weekly quizzes.

A.1 Quiz 1—R Preliminaries

Grading policies for the course
Course requirements / policies for in-class quizzes
Free and open source software
“Free as in beer” versus “free as in speech”
Advantages and disadvantages of interpreted languages compared to “Point-and-click” software
Advantages and of interpreted languages compared to compiled languages and assembly languages
Difference between R and RStudio
R session
R console
Function arguments (including required versus optional arguments)
Accessing a function’s helpfile using ?
Mathematical operators: +, -, *, -
R objects and object names
“gets arrow”: <-
Rules and style guidelines for naming objects
ls()
R scripts
# comment character
R packages
CRAN
Installing packages
install.packages()
Loading a package
library()
Types of package documentation: vignettes and helpfiles
vignette(), option package =
Vectors
c()
Two of the basic classes of vectors: character and numeric
class()
Square bracket indexing for vectors: [...]
Dataframes
tibble() function from the dplyr package
select and slice functions to extract values from dataframes
read_csv, option skip =
str()
summary()
dim()
ncol()
nrow()
Nate Silver
FiveThirtyEight
NA for missing values
$ to get a column from a dataframe
paste(), option sep =
paste0()
= vs. <- for assignment expressions
package::function() notation

What kinds of data can be read into R?
delimited files (csv, tsv)
fixed width files
delimiter
read_delim, options delim =, skip =, n_max =, col_names =
read_fwf
read_csv, options skip =, n_max =, col_names =
readxl package and its read_excel() function
haven package and its read_sas() function
NA
Computer directory structure
working directory
getwd()
list.files()
relative pathnames
absolute pathnames
shorthand for pathnames: ., .., ../data, etc.
Reading in data from either a local or online flat file
paste(), option sep =
paste0()
How to read flat files of data that are online directly into R
dplyr package
rename()
Why you might want to rename column names (e.g., uppercase, long, unusual characters)
select()
slice()
mutate()
filter()
arrange(), including with desc()
%>%, advantages of piping

Main types of vector classes in R: character, numeric, factor, date, logical
lubridate functions, including ymd, ymd_hm, mdy, wday, and mday
Common logical operators in R (==, !=, %in%, is.na(), &, |)
data() (with and without the name of a dataset as an option)
library() (with and without an argument in the parentheses)
logical vectors, including running sum on a logical vector
What the bang operator (!) does to a logical operator
The tidyverse
min()
max()
mean()
median()
summarize()
Special functions to use with summarize(): n(), n_distinct(), first(), last()
Using group_by() before using summarize()
The three basic elements of a ggplot plot: data, aesthetics, and geoms
aes function and common aesthetics, including color, shape, x, y, alpha, size, and fill
Mapping an aesthetic to a column in the data versus setting it to a constant value
Some common geoms: geom_histogram, geom_points, geom_lines, geom_boxplot()
The difference between “statistical” geoms (e.g., geom_boxplot, geom_smooth) and “non-statistical” (e.g., geom_point, geom_line)
Common additions to ggplot objects: ggtitle, labs, xlim, ylim, expand_limits

Guidelines for good graphics
Data density / data-to-ink ratio
Small multiples
Edward Tufte
Hadley Wickham
Where to put the + in ggplot statements to avoid problems (ends of lines instead of starts of new lines)
Can you save a ggplot object as an R object that you can reference later? If so, how would you add elements on to that object? How would you print it when you were ready to print the graph to your RStudio graphics window?
geom_hline(), geom_vline()
geom_text()
facet_grid(), facet_wrap()
grid.arrange() from the gridExtra package
ggthemes package, including theme_few() and theme_tufte()
Setting point color for geom_point() both as a constant (all points red) and as a way to show the level of a factor for each observation
size, alpha, color
Re-naming and re-ordering factors
Note: If you read this and find and bring in an example of a “small multiples” graph (from a newspaper, a website, an academic paper), you can get one extra point on this quiz

Reproducible research, including what it is and advantages to aiming to make your research reproducible
R style guidelines on variable names, <- vs. =, line length, spacing, semicolons, commenting, indentation, and code grouping
Markup languages (concept and examples)
Basic conventions for Markdown (bold, italics, links, headers, lists)
Literate programming
What working directory R uses for code in an .Rmd document
Basic syntax for RMarkdown chunks, including how to name them
Options for RMarkdown chunks: echo, eval, messages, warnings, include, fig.width, fig.height, results
Difference between global options and chunk options, and which takes precendence
What inline code is and how to write it in RMarkdown
How to set global options
Why style is important in coding
RPubs

Three characteristics of tidy data
Five common problems with tidy data and how to resolve them (make sure you understand the examples shown, which you can find out more about in the Hadley Wickham paper I reference in the slides)
group_by with mutate, slice, and arrange
separate and unite
pivot_longer and pivot_wider
The *_join family of functions (left_join, right_join, inner_join, full_join, anti_join, semi_join)

regular expressions
difference in the output between str_extract and str_detect
str_to_lower, str_to_upper, str_to_title, str_trim
lists
indexing from lists ([[ and $)
exploring lists (class, names, str functions)
exploring lists with jsonedit from the listviewer package
tidy function from broom applied to the output from a statistical test function
forcats package, including fct_recode, fct_infreq, fct_reorder, and fct_lump
Bioconductor
“accessor” functions for Bioconductor (e.g., get_variable)
select with starts_with, select_if

Review of previous weeks (especially tidyverse packages: dplyr, tidyr, ggplot2, stringr, and forcats)
Writing a formula with ~ syntax
Regression modeling with lm, glm
Using functions from broom to tidy model output (augment, tidy, glance)