The sf
package allows you to create a dataframe object with one special characteristic: a
special column
called “geometry” that contains the geometrical data needed to draw
an observation. For example, if you have a dataset of motor vehicle accidents, where each
row gives the data for an accident, the geometry
column might include the latitude and
longitude for the location of the accident. As another example, if you have a data set of
the total number of motor vehicle fatalities in a year in each county in a state, the
geometry
column might have all the latitude and longitude points to form the
boundary of each county.
These special dataframes are given a class called sf20 sf. Short for “simple features”,
the name of both an R package and the object class created and used by the data. This class
includes a special column called “geometry” for geographic information about each observation.
Objects with this class can be used for mapping spatial data, but also can be manipulated
using tidyverse tools very similarly to tibbles.
There are several ways for you to create this special type of dataframe. We’ll create a few
to use in the later mapping examples to demonstrate some of the ways to get an sf
object.
First, if you have a regular dataframe, you can
convert it into an sf
object, specifying which parts of the dataframe include geographical
information.
If you followed all the set-up instructions in the “Prerequisites”, you should have downloaded a
dataset called “fl_accidents.csv” and in the “data” subdirectory of the R Project directory
for the examples. You can use readr
to read it in. If you print out the start of it, you’ll see
that it’s got observations (rows) of fatal motor vehicle accidents. These accidents all occurred in
Florida within a week of Hurricane Irma’s landfall on September 10, 2017. The columns give
the county code (fips
), date (date
), location (latitude
and longitud
), and the number of
fatalities (fatals
).
library("readr")
fl_accidents <- read_csv("data/fl_accidents.csv")
fl_accidents
## # A tibble: 37 x 5
## fips date latitude longitud fatals
## <dbl> <date> <dbl> <dbl> <dbl>
## 1 12031 2017-09-08 30.2 -81.5 1
## 2 12095 2017-09-07 28.5 -81.4 1
## 3 12097 2017-09-08 28.3 -81.3 1
## 4 12095 2017-09-07 28.6 -81.2 1
## 5 12031 2017-09-08 30.2 -81.8 1
## 6 12033 2017-09-07 30.6 -87.4 2
## 7 12023 2017-09-10 30.1 -82.7 1
## 8 12075 2017-09-08 29.6 -82.9 1
## 9 12045 2017-09-09 30.1 -85.3 2
## 10 12031 2017-09-12 30.4 -81.8 1
## # … with 27 more rows
Even though this data has geographical information in it (latitude and longitude), it’s currently
just a regular dataframe. To convert it to an sf
class object, you can use the st_as_sf
function,21
Most of the functions in the st
package start with st_
. Since R studio allows tab completion,
this makes it very easy to look up a function in the package whose name you might have forgotten.
Just try typing st_
and then the Tab key in your R console—you should get a pop-up with a list of
suggestions for possible function names.
specifying the columns with the geogrphical coordinates using the coords
parameter.22 Here, the
compound pipe operator, %<>%
, applies the function to fl_accidents
and then overwrites the
fl_accidents
with the modified version, updating the object so you’re ready to use it for later
code. We also want to go ahead and set the projection to something reasonable for latitude-longitude
data; the “4326” code is a good pick. That can be set with the st_sf
function (for a lot more
on map projections, see the resources in “Learn More”).
library("sf")
fl_accidents %<>%
st_as_sf(coords = c("longitud", "latitude")) %>%
st_sf(crs = 4326)
Now, when you print out fl_accidents
, you’ll see some extra information at the top of the print-out,
including the objects bounding box (bbox
) and projection (epsg
, proj4string
).
fl_accidents
## Simple feature collection with 37 features and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -87.3797 ymin: 25.6876 xmax: -80.32332 ymax: 30.65894
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 37 x 4
## fips date fatals geometry
## <dbl> <date> <dbl> <POINT [°]>
## 1 12031 2017-09-08 1 (-81.5407 30.2474)
## 2 12095 2017-09-07 1 (-81.41927 28.51796)
## 3 12097 2017-09-08 1 (-81.34221 28.26149)
## 4 12095 2017-09-07 1 (-81.18188 28.57327)
## 5 12031 2017-09-08 1 (-81.84294 30.22846)
## 6 12033 2017-09-07 2 (-87.3797 30.56341)
## 7 12023 2017-09-10 1 (-82.71024 30.13608)
## 8 12075 2017-09-08 1 (-82.87318 29.56196)
## 9 12045 2017-09-09 2 (-85.25815 30.06198)
## 10 12031 2017-09-12 1 (-81.76086 30.37913)
## # … with 27 more rows
A second way to create an sf
object in R is to read in data from a geographic data file, like
a shapefile23 shapefile. A format for storing geographical data common for GIS. The data
will typically be stored in a directory (often zipped to a single file), with separate files
that give geographic data and other characteristics of the data.
If you followed the “Prerequisites”, you should have downloaded a shapefile as a subdirectory called
“al112017_best_track” in the “data” subdirectory.
I downloaded this as a zipped data file from the National Hurricane Center and unzipped it.
The shapefile includes shapefiles for the track of
Hurricane Irma.
If you look at the “al112017_best_track” directory, you will see that it contains a collection of files starting with one of several roots (e.g., “al112017_lin”) and with one of several suffixes (e.g., “.dbf”, “.prj”, “.shx”).
list.files("data/al112017_best_track/")
## [1] "al112017_lin.dbf" "al112017_lin.prj"
## [3] "al112017_lin.shp" "al112017_lin.shp.xml"
## [5] "al112017_lin.shx" "al112017_pts.dbf"
## [7] "al112017_pts.prj" "al112017_pts.shp"
## [9] "al112017_pts.shp.xml" "al112017_pts.shx"
## [11] "al112017_radii.dbf" "al112017_radii.prj"
## [13] "al112017_radii.shp" "al112017_radii.shp.xml"
## [15] "al112017_radii.shx" "al112017_windswath.dbf"
## [17] "al112017_windswath.prj" "al112017_windswath.shp"
## [19] "al112017_windswath.shp.xml" "al112017_windswath.shx"
Each set of files with the same root provides a “layer” of the shapefile, giving a set of geographic information. For example, the “al112017_lin." files provides the line of the hurricane’s track, while the "al112017_windswarth.” layer provides windswaths (how severe the wind was in certain locations surrounding the storm track).
You can read in any of these layers into R as an sf
object using st_read
and specifying
the layer you’d like from the shapefile directory with the layer
parameter
(put the root of the filenames
for the layer you want).
Since we’ll be plotting this with the accident data, we should transform the data’s
projection from its original projection to the one we set for the fl_accidents
data. You can do that with the st_transform
function.
For example, to read in a line with the track of irma, you can run:
irma_tracks <- st_read("data/al112017_best_track/",
layer = "al112017_lin") %>%
st_transform(crs = 4326)
## Reading layer `al112017_lin' from data source `/Users/georgianaanderson/Documents/r_workshops/navy_public_health/data/al112017_best_track' using driver `ESRI Shapefile'
## Simple feature collection with 20 features and 3 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -90.1 ymin: 16.1 xmax: -26.9 ymax: 36.8
## epsg (SRID): NA
## proj4string: +proj=longlat +a=6371200 +b=6371200 +no_defs
If you print out irma_tracks
, you can see that it looks like a dataframe, but with extra
geographic information (bounding box, projection, etc.) as well as a special column for
geometry, which gives the latitude and longitude of each accident.
irma_tracks
## Simple feature collection with 20 features and 3 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -90.1 ymin: 16.1 xmax: -26.9 ymax: 36.8
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## STORMNUM STORMTYPE SS geometry
## 1 11 Tropical Depression 0 LINESTRING (-26.9 16.1, -28...
## 2 11 Tropical Storm 0 LINESTRING (-28.3 16.2, -29...
## 3 11 Hurricane1 1 LINESTRING (-32.5 16.4, -33...
## 4 11 Hurricane2 2 LINESTRING (-34.2 17.1, -35...
## 5 11 Hurricane3 3 LINESTRING (-35.1 17.5, -36...
## 6 11 Hurricane2 2 LINESTRING (-42.6 18.9, -44...
## 7 11 Hurricane3 3 LINESTRING (-47.9 17.9, -49...
## 8 11 Hurricane4 4 LINESTRING (-53.9 16.7, -55...
## 9 11 Hurricane5 5 LINESTRING (-57.8 16.7, -59...
## 10 11 Hurricane4 4 LINESTRING (-73 21.5, -73.2...
Finally, there are some packages available now that allow you to read data, including spatial
data, directly into R from large open data databases, using R functions that wrap the database’s
Application Programming Interfaces (APIs). For example, the tigris
package lets you pull
spatial data into R directly from the US Census. To get spatial data for all the counties in
Florida, you can run:
library(tigris)
fl_counties <- counties(state = "FL", cb = TRUE, class = "sf")
If, for any reason, you’re not able to get the above code to work, I pulled and saved this data with the example data, and you can load it with the following code if you followed all the directions in the “Prerequisites” and are having problems with the previous code:
load("data/fl_counties.RData")