A Appendix A: Vocabulary
You will be responsible for knowing the following functions and vocabulary for the weekly quizzes.
A.1 Quiz 1—R Preliminaries
- Grading policies for the course
- Course requirements / policies for in-class quizzes
- Free and open source software
- “Free as in beer” versus “free as in speech”
- Advantages and disadvantages of interpreted languages compared to “Point-and-click” software
- Advantages and of interpreted languages compared to compiled languages and assembly languages
- Difference between R and RStudio
- R session
- R console
- Function arguments (including required versus optional arguments)
- Accessing a function’s helpfile using
?
- Mathematical operators:
+
,-
,*
,-
- R objects and object names
- “gets arrow”:
<-
- Rules and style guidelines for naming objects
ls()
- R scripts
#
comment character- R packages
- CRAN
- Installing packages
install.packages()
- Loading a package
library()
- Types of package documentation: vignettes and helpfiles
vignette()
, optionpackage =
- Vectors
c()
- Two of the basic classes of vectors: character and numeric
class()
- Square bracket indexing for vectors:
[...]
- Dataframes
tibble()
function from thedplyr
packageselect
andslice
functions to extract values from dataframesread_csv
, optionskip =
str()
summary()
dim()
ncol()
nrow()
- Nate Silver
- FiveThirtyEight
NA
for missing values$
to get a column from a dataframepaste()
, optionsep =
paste0()
=
vs.<-
for assignment expressionspackage::function()
notation
A.2 Quiz 2—Entering / cleaning data #1
- What kinds of data can be read into R?
- delimited files (csv, tsv)
- fixed width files
- delimiter
read_delim
, optionsdelim =
,skip =
,n_max =
,col_names =
read_fwf
read_csv
, optionsskip =
,n_max =
,col_names =
readxl
package and itsread_excel()
functionhaven
package and itsread_sas()
functionNA
- Computer directory structure
- working directory
getwd()
list.files()
- relative pathnames
- absolute pathnames
- shorthand for pathnames:
.
,..
,../data
, etc. - Reading in data from either a local or online flat file
paste()
, optionsep =
paste0()
- How to read flat files of data that are online directly into R
dplyr
packagerename()
- Why you might want to rename column names (e.g., uppercase, long, unusual characters)
select()
slice()
mutate()
filter()
arrange()
, including withdesc()
%>%
, advantages of piping
A.3 Quiz 3
- Main types of vector classes in R: character, numeric, factor, date, logical
lubridate
functions, includingymd
,ymd_hm
,mdy
,wday
, andmday
- Common logical operators in R (
==
,!=
,%in%
,is.na()
,&
,|
) data()
(with and without the name of a dataset as an option)library()
(with and without an argument in the parentheses)- logical vectors, including running
sum
on a logical vector - What the bang operator (
!
) does to a logical operator - The tidyverse
min()
max()
mean()
median()
summarize()
- Special functions to use with
summarize()
:n()
,n_distinct()
,first()
,last()
- Using
group_by()
before usingsummarize()
- The three basic elements of a
ggplot
plot: data, aesthetics, and geoms aes
function and common aesthetics, includingcolor
,shape
,x
,y
,alpha
,size
, andfill
- Mapping an aesthetic to a column in the data versus setting it to a constant value
- Some common geoms:
geom_histogram
,geom_points
,geom_lines
,geom_boxplot()
- The difference between “statistical” geoms (e.g.,
geom_boxplot
,geom_smooth
) and “non-statistical” (e.g.,geom_point
,geom_line
) - Common additions to
ggplot
objects:ggtitle
,labs
,xlim
,ylim
,expand_limits
A.4 Quiz 4
- Guidelines for good graphics
- Data density / data-to-ink ratio
- Small multiples
- Edward Tufte
- Hadley Wickham
- Where to put the
+
in ggplot statements to avoid problems (ends of lines instead of starts of new lines) - Can you save a ggplot object as an R object that you can reference later? If so, how would you add elements on to that object? How would you print it when you were ready to print the graph to your RStudio graphics window?
geom_hline()
,geom_vline()
geom_text()
facet_grid()
,facet_wrap()
grid.arrange()
from thegridExtra
packageggthemes
package, includingtheme_few()
andtheme_tufte()
- Setting point color for
geom_point()
both as a constant (all points red) and as a way to show the level of a factor for each observation size
,alpha
,color
- Re-naming and re-ordering factors
- Note: If you read this and find and bring in an example of a “small multiples” graph (from a newspaper, a website, an academic paper), you can get one extra point on this quiz
A.5 Quiz 5
- Reproducible research, including what it is and advantages to aiming to make your research reproducible
- R style guidelines on variable names,
<-
vs.=
, line length, spacing, semicolons, commenting, indentation, and code grouping - Markup languages (concept and examples)
- Basic conventions for Markdown (bold, italics, links, headers, lists)
- Literate programming
- What working directory R uses for code in an .Rmd document
- Basic syntax for RMarkdown chunks, including how to name them
- Options for RMarkdown chunks:
echo
,eval
,messages
,warnings
,include
,fig.width
,fig.height
,results
- Difference between global options and chunk options, and which takes precendence
- What inline code is and how to write it in RMarkdown
- How to set global options
- Why style is important in coding
- RPubs
A.6 Quiz 6
- Three characteristics of tidy data
- Five common problems with tidy data and how to resolve them (make sure you understand the examples shown, which you can find out more about in the Hadley Wickham paper I reference in the slides)
group_by
withmutate
,slice
, andarrange
separate
andunite
pivot_longer
andpivot_wider
- The
*_join
family of functions (left_join
,right_join
,inner_join
,full_join
,anti_join
,semi_join
)
A.7 Quiz 7
- regular expressions
- difference in the output between
str_extract
andstr_detect
str_to_lower
,str_to_upper
,str_to_title
,str_trim
- lists
- indexing from lists (
[[
and$
) - exploring lists (
class
,names
,str
functions) - exploring lists with
jsonedit
from thelistviewer
package tidy
function frombroom
applied to the output from a statistical test functionforcats
package, includingfct_recode
,fct_infreq
,fct_reorder
, andfct_lump
- Bioconductor
- “accessor” functions for Bioconductor (e.g.,
get_variable
) select
withstarts_with
,select_if