10:00
8 Mapping
8.1 Working with spatial data in R
Throughout this tutorial you will be using ESRI Health Facilities in Kenya data It covers all the health facilities in Kenya, with the agency they are run by, their facility type, and their location. This data was last updated in 2022.
We want to visualise, and better understand how many health facilities there are at the sub-county and county level, and what the distribution of the different types of facility there are. To achieve this we will have to import spatial data, manipulate it, create summary statistics, and then plot it.
The data has been tidied up and saved as a Comma Separated Value file (.csv). We can use read_csv
to open it in R.
8.1.1 Exercise
- Create a new object called
health
by usingread_csv()
. - Use
glimpse()
orhead()
to view the structure ofhealth
.
Solution
<- readr::read_csv("data/Health_facilities_-4126638174547585951.csv") health
Rows: 4867 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Facility Name, Province, District, Division, LOCATION, Sub Location...
dbl (8): FID, Facility Number, HMIS, Facility Type, Latitude, Longitude, x, y
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Solution
health
# A tibble: 4,867 × 17
FID `Facility Number` `Facility Name` HMIS Province District Division
<dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 1 2 BARGONI HEALTH CENT… 3606 COAST LAMU HINDI
2 2 3 FAZA HEALTH CENTRE 626 COAST LAMU FAZA
3 3 4 HINDI MAGOGONI DISP 0 COAST LAMU HINDI
4 4 5 HINDI PRISON DISP 615 COAST LAMU HINDI
5 5 6 HONGWE CATHOLIC HEA… 616 COAST LAMU MPEKETO…
6 6 7 ICHUNDWA DISP 625 COAST LAMU KIZINGI…
7 7 8 KIUNGA HEALTH CENTRE 627 COAST LAMU KIUNGA
8 8 9 KIWAYU DISP 618 COAST LAMU KIZINGI…
9 9 10 KIZINGITINI DISP 617 COAST LAMU KIZINGI…
10 10 11 LAKE KENYATTA HEALT… 2451 COAST LAMU MPEKETO…
# ℹ 4,857 more rows
# ℹ 10 more variables: LOCATION <chr>, `Sub Location` <chr>, Spatial_Re <chr>,
# `Facility Type` <dbl>, Agency <chr>, Latitude <dbl>, Longitude <dbl>,
# GlobalID <chr>, x <dbl>, y <dbl>
health
is currently just a data frame - it has not got an explicit geometry column which links observations to their geographic location. It does however contain several columns which can be used to convert it into a spatial data format.
We could use the various names of locations to use as locations. But we would have to find a pre-existing map which had coordinates for each location. These locations could also be quite imprecise.
Fortunately we have also been provided with columns recording the Latitude, and Longitude of each incident. We can use those to convert health
into an sf
object. To achieve this we will use the st_as_sf()
function which takes the following arguments:
<- st_as_sf(x = input_data_frame, coords = c("x_coordinate_column", "y_coordinate_column"), crs = 27700) new_object
8.1.2 Exercise
10:00
- Create a new object called
health_sf
by convertinghealth
using thest_as_sf()
function. - Use
glimpse()
orhead()
to viewhealth_sf
structure.
Solution
<- sf::st_as_sf(health, coords = c("Longitude","Latitude"), crs = 4326)
health_sf head(health_sf)
Simple feature collection with 6 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 40.65767 ymin: -2.341627 xmax: 41.12397 ymax: -2.057423
Geodetic CRS: WGS 84
# A tibble: 6 × 16
FID `Facility Number` `Facility Name` HMIS Province District Division
<dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 1 2 BARGONI HEALTH CENTRE 3606 COAST LAMU HINDI
2 2 3 FAZA HEALTH CENTRE 626 COAST LAMU FAZA
3 3 4 HINDI MAGOGONI DISP 0 COAST LAMU HINDI
4 4 5 HINDI PRISON DISP 615 COAST LAMU HINDI
5 5 6 HONGWE CATHOLIC HEAL… 616 COAST LAMU MPEKETO…
6 6 7 ICHUNDWA DISP 625 COAST LAMU KIZINGI…
# ℹ 9 more variables: LOCATION <chr>, `Sub Location` <chr>, Spatial_Re <chr>,
# `Facility Type` <dbl>, Agency <chr>, GlobalID <chr>, x <dbl>, y <dbl>,
# geometry <POINT [°]>
st_as_sf()
converted the Latitude and Longitude columns to simple feature geometries and created a new column called geometry which holds spatial information for each row. Now that health
is a spatial object we can plot it using the ggplot2
package. For now we will use the geom_sf()
function which creates a quick map, using ggplot2's
default settings. geom_sf()
only needs to be supplied with a simple feature object and is very useful for quickly inspecting your data. To quickly plot multiple layers on the same map use geom_sf() + geom_sf()
.
8.1.3 Exercise
03:00
- Plot
health_sf
using thegeom_sf()
function.
Solution
::ggplot(health_sf) +
ggplot2::geom_sf() ggplot2
We can also create interactive maps using plotly
or leaflet
. For plotly
we can write the map using ggplot2, before executing ggplotly()
.
8.1.4 Exercise
03:00
- Make an interactive map of
health_sf
using theggplotly()
function.
Solution
<- ggplot2::ggplot(health_sf) +
ggplot_graph ::geom_sf()
ggplot2
::ggplotly(ggplot_graph) plotly
8.2 Filtering by Administration Code
Our health facilities dataset covers all of Kenya. Let’s look at solely locations in Nairobi to narrow down our exploration more. To remove all points outside of Nairobi we will have to import a shapefile with the right county level and then use it to spatially filter our health_sf
data.
So far we have created our own sf
objects by adding a geometry column. The kenya data set is already a spatial one and as such we can use the st_read()
function from the sf
package to import it. st_read
is extremely versatile and able to import most spatial data formats into R. The only argument that needs to be supplied to st_read
is the fullpath to the file
8.2.1 Exercise
10:00
- You will need to download the Kenya shapefile. This is called
ken_adm_iebc_20191031_SHP.zip
on the website. It will download a zip file, which you need to extract. You must extract the whole contents, as a shapefile is composed of multiple files. - Then, using
st_read
, read inken_admbnda_adm2_iebc_20191031.shp
, which is the shapefile containing the lowest level, subcounty boundaries. - Make a static map of the object you have just created using
geom_sf()
.
Solution
<- sf::st_read("data/admn2/ken_admbnda_adm2_iebc_20191031.shp", quiet = TRUE) kenya_2017
Sub-county level boundaries have loaded correctly but they currently cover all of Kenya. Because simple feature objects are data frames with a geometry column attached, any operations that we would perform on a normal data frame can also be performed on an object of class sf
. We will use dplyr::filter
and stringr::str_detect()
to only keep sub-counties where their ADM1_PCODE
code starts with KE047. KE047 denotes that an sub-county is part of Nairobi county.
8.2.2 Exercise
10:00
- Inspect kenya_2017 using
head()
orglimpse()
, and identify which column holds the county codes. - Create a new object called
nairobi_subcounty
. Usedplyr::filter
alongsidestringr::str_detect()
to only keep observations which have a Administrative code starting with “KE047”. - Plot
nairobi_subcounty
to see if the results look correct.
Solution
head(kenya_2017)
Simple feature collection with 6 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 34.04426 ymin: -1.016204 xmax: 36.24609 ymax: 0.5589468
Geodetic CRS: WGS 84
Shape_Leng Shape_Area ADM2_EN ADM2_PCODE ADM2_REF ADM2ALT1EN ADM2ALT2EN
1 1.7469864 0.04082931 Ainabkoi KE027144 <NA> <NA> <NA>
2 0.9173066 0.01995665 Ainamoi KE035190 <NA> <NA> <NA>
3 1.4026374 0.03799999 Aldai KE029152 <NA> <NA> <NA>
4 1.0813543 0.04935731 Alego Usonga KE041234 <NA> <NA> <NA>
5 0.7439150 0.02136547 Awendo KE044254 <NA> <NA> <NA>
6 0.7970553 0.02423416 Bahati KE032174 <NA> <NA> <NA>
ADM1_EN ADM1_PCODE ADM0_EN ADM0_PCODE date validOn ValidTo
1 Uasin Gishu KE027 Kenya KE 2017-11-03 2019-10-31 -001-11-30
2 Kericho KE035 Kenya KE 2017-11-03 2019-10-31 -001-11-30
3 Nandi KE029 Kenya KE 2017-11-03 2019-10-31 -001-11-30
4 Siaya KE041 Kenya KE 2017-11-03 2019-10-31 -001-11-30
5 Migori KE044 Kenya KE 2017-11-03 2019-10-31 -001-11-30
6 Nakuru KE032 Kenya KE 2017-11-03 2019-10-31 -001-11-30
geometry
1 MULTIPOLYGON (((35.35933 0....
2 MULTIPOLYGON (((35.26262 -0...
3 MULTIPOLYGON (((34.93989 0....
4 MULTIPOLYGON (((34.20727 0....
5 MULTIPOLYGON (((34.54577 -0...
6 MULTIPOLYGON (((36.20478 -0...
Solution
<- dplyr::filter(kenya_2017, stringr::str_detect(ADM1_PCODE, "KE047"))
nairobi_subcounty ::ggplot(nairobi_subcounty) +
ggplot2::geom_sf() ggplot2
Finally, for the next step, we only need the outer boundary of Nairobi - all the internal subcounty boundaries have to be removed and only the outer edges kept. sf
has a function exactly for this purpose called st_union()
. It only takes one argument, which is the sf
object we want to merge.
8.2.3 Exercise
10:00
- Create a new object called
nairobi_boundary
using thest_union
function. - Plot it to check the results.
Solution
<- sf::st_union(nairobi_subcounty)
nairobi_boundary ::ggplot(nairobi_boundary) +
ggplot2::geom_sf() ggplot2
8.3 Spatial subsetting and CRS
In addition to subsetting by value, as we did with the Nairobi boundaries earlier, we can also subset observations by evaluating their spatial relationship with another data set. We can for example select all counties intersected by a river, or all households outside of city boundaries. There are a number of different spatial relationships which can be tested and used to subset observations.
sf
has an inbuilt function called st_filter()
which we can use to spatially subset observations. The function takes several arguments:
- x -
sf
data frame we want to subset -health_sf
- y -
sf
object used to evaluate the spatial relationship -nairobi_boundary
Before running any spatial operations on two spatial objects it is always worth checking if their coordinate reference systems (CRS) match. sf
will throw an error if that’s not the case. Try it for yourself below.
8.3.1 Exercise
10:00
- Use
st_filter()
to spatially subsethealth_sf
by testing its relationship withnairobi_boundary
.
Solution
<- st_filter(x = health_sf, y = nairobi_boundary) health_sf_nairobi
You should have got an error here saying x st_crs(x) == st_crs(y) is not TRUE
. It means that objects x and y have different coordinate reference systems. Spatial operations require all objects to have the same CRS. We can see this for ourselves by running the st_crs()
function, which returns the coordinate reference system of an object.
8.3.2 Exercise
03:00
- Run
st_crs()
on both andhealth_sf
andnairobi_boundary
and compare the results.
Solution
::st_crs(health_sf)[[1]] sf
[1] "EPSG:4326"
Solution
::st_crs(nairobi_boundary)[[1]] sf
[1] "WGS 84"
st_crs()
provides detailed information about the CRS and projection of data, but all we need to check is its first element denoted by [[1]]
. We can see that health_sf
uses EPSG:4326
, while nairobi_boundary
is set to WGS 84
. This problem can be solved by transforming nairobi_boundary
’s CRS to match that of health_sf
, simply by using the correct EPSG code. To do so we will use the st_transform()
function which takes two arguments:
- x -
sf
object to be transformed - crs - EPSG code that we want to transform our data to - BNG is 4326
8.3.3 Exercise
10:00
- Run
st_transform()
to transform and overwritenairobi_boundary
. Remember to set the correct CRS. - Run
st_crs()
onhealth_sf
and newly transformednairobi_boundary
and compare the results.
Solution
<- sf::st_transform(nairobi_boundary, crs = 4326)
nairobi_boundary
::st_crs(health_sf)[[1]] sf
[1] "EPSG:4326"
Solution
::st_crs(nairobi_boundary)[[1]] sf
[1] "EPSG:4326"
Now that the CRS are matching we should be able to spatially subset health_sf
.
8.3.4 Exercise
10:00
- Use
st_filter
to spatially subsethealth_sf
by testing its relationship withnairobi_boundary
. Overwritehealth_sf
with the subset data. - Plot it to check if the results are correct - all points should be within Nairobi
Solution
<- sf::st_filter(x = health_sf, y = nairobi_boundary)
health_sf_filter <- ggplot2::ggplot() +
nairobi_map ::geom_sf(data = nairobi_boundary) +
ggplot2::geom_sf(data = health_sf_filter) ggplot2