5 checkpoints for GGPLOT Maps in R

Covid cases in Australia

Madhuka De Silva
Analytics Vidhya

--

Hi! I am a noob to R and Data Viz. This is me trying to share what I figured out while trying out the cool stuff taught at R Ladies Meetup in Sri Lanka, conducted by Data Analyst, Stephanie Kobakian on “Around the world in 30 minutes” (Map creation with R). Refer: https://srkcolombo.netlify.app/#1

This article provides ggplot manipulations with,

01 | Reversing palette

02 | Labelling

03 | Gradients

04 | Rainbow or no rainbow palette

05 | Color deficiencies

Context: Covid cases in Australian regions

Code looks like this and the full code is available in the link,

Important part of code to focus is only the code in bold below.

world <- map_data(“world”)
Australia <- world %>% filter(region == “Australia”)
Australia

ggplot(Australia) +
geom_polygon(aes(x = long, y = lat, group = group))

sf_oz <- ozmap_data(“country”)
sf_oz %>% kable()
ggplot(sf_oz) + geom_sf()

sf_states <- ozmap_data(“states”)
sf_states %>% kable()
ggplot(sf_states) + geom_sf()

covid_url <- “https://covidlive.com.au/report/cases"
covid_data <- bow(covid_url) %>%
scrape() %>%
html_table() %>%
purrr::pluck(2) %>%
as_tibble()

covid_data <- covid_data %>%
mutate(STATE = case_when(
STATE == “NSW” ~ “New South Wales”,
STATE == “WA” ~ “Western Australia”,
STATE == “SA” ~ “South Australia”,
STATE == “NT” ~ “Northern Territory”,
STATE == “ACT” ~ “Australian Capital Territory”,
TRUE ~ STATE )) %>%
mutate(CASES = parse_number(CASES))

covid_states <- left_join(ozmap_states, covid_data,
by = c(“NAME” = “STATE”))

covid_states <- covid_states %>%
filter(!(NAME == “Other Territories”))

ggplot(covid_states) +
geom_sf(aes(fill = CASES))

The resulting map looks like this,

A map of Australian regions showing covid cases indicated by a blue color based legend
A map of Australian regions showing covid cases indicated by a blue color based legend

Since ggplot2 package was used, I got a map as above with its default color range (Dark2 palette).

01 | Color palette reverse? change?

Hmm, I felt like higher the value is, more darker the blue should be (varying luminance/hue), so I did a small change like this (bold code shows change).

ggplot(covid_states, aes(order = CASES)) +
geom_sf(aes(fill = CASES))+
scale_fill_gradient(high = “#132B43”,
low = “#56B1F7”)

Then I got a graph as follows, now I really can notice where the Covid cases should be noticed! (PS: There are other ways to reverse but I am not smart enough expert to apply it :P)

A map of Australian regions showing covid cases indicated by a blue color based legend but this time reversed, light to darker means low values to higher values.
A map of Australian regions showing covid cases indicated by a blue color based legend but this time reversed, light to darker means low values to higher values.

Huh! But then what am I explaining?? I can’t say which looks high or less unless I am an Australian or knows firmly which is what right (:P)!

PS: I am from Sri Lanka in case if you did not read the title!

02 | Monica’s label maker

So here I tried labelling (bold code shows change),

ggplot(covid_states) +
geom_sf(aes(fill = CASES))+
scale_fill_gradient(high = “#132B43”,low = “#56B1F7”)+
geom_sf_label(aes(label = NAME), colour = “black”, size = 2.5)

Reference: https://yutani.rbind.io/post/geom-sf-text-and-geom-sf-label-are-coming/

PS: Both geom_sf_label() and geom_sf_text()are good options.

A map of Australian regions showing covid cases indicated by a blue color based legend and this time with labels on the regions.
A map of Australian regions showing covid cases indicated by a blue color based legend and this time with labels on the regions.

Okay, I see that Victoria has Covid cases around 20,000 while New South Wales having cases between 5,000- 10,000 but then how about others?

Am I the only one curious about other regions? I mean those are less than 5,000 but is it 0 or 4999? So I tried using the following,

03 | Gradients for a better clear picture

ggplot(covid_states) +
geom_sf(aes(fill = CASES))+
scale_fill_gradient(low = “white”, high = “black”)+

geom_sf_label(aes(label = NAME), colour = “black”, size = 2.5)

A map of Australian regions showing covid cases indicated by a grey scale based legend and with labels on the regions.
A map of Australian regions showing covid cases indicated by a grey scale based legend and with labels on the regions.

or may be one like this,

ggplot(covid_states) +
geom_sf(aes(fill = CASES))+
scale_fill_gradientn(
colours = rainbow(5),
values = NULL,
space = “Lab”,
na.value = “grey50”,
guide = “colourbar”,
aesthetics = “fill”)+

geom_sf_label(aes(label = NAME), colour = “black”, size = 2.5)

A map of Australian regions showing covid cases indicated by a rainbow color scale based legend and with labels on the regions.
A map of Australian regions showing covid cases indicated by a rainbow color scale based legend and with labels on the regions.

Reference on color gradients: https://www.datanovia.com/en/blog/ggplot-colors-best-tricks-you-will-love/

It can be understood that Queensland, Western Australia and South Australia is not clearly zero with the orange-ish shade but how about red colored regions? Well, I am the noob, you tell me! I mean should we try to indicate the value there or does it really matter to know exact amount of Covid cases in all regions with just a color palette?

04 | I hate Ross, I love Ross (Ross = Rainbows)

And oh by the way, it is not wise to use rainbow palette for continuous data as it might mislead due to non-uniformity of the spectrum(plus many other perceptual reasons) which results in confusion, so don’t be mad at me for using it here, I just experimented ;)

BUT then again, if you read the reference paper closely, it states that,

While the RGB rainbow() is very unbalanced, the HCL rainbow_hcl() (or also qualitative_hcl()) is (by design) balanced with respect to luminance.

Meaning, if you really want to use rainbow now you have a better version, rainbow_hcl(). It looks like this,

A map of Australian regions showing covid cases indicated by a rainbow color scale based legend and with labels on the regions but this time the rainbow is HCL based instead of RGB.
A map of Australian regions showing covid cases indicated by a rainbow color scale based legend and with labels on the regions but this time the rainbow is HCL based instead of RGB.

Reference: Paper on “colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes” https://arxiv.org/pdf/1903.06490.pdf

05 | Traffic light is not for everyone

Finally and most importantly (since I am passionate on inclusive design), I tried something like this, a different color palette helping blind deficiencies.

R provides simulate_cvd() package,

which can take any vector of valid R colors and transform them according to a certain CVD transformation matrix and transformation equation. The convenience interfaces deutan(), protan(), and tritan() are the high-level functions for simulating the corresponding kind of color blindness with a given severity (calling simulate_cvd() internally)

Reference: http://colorspace.r-forge.r project.org/articles/color_vision_deficiency.html

So if somebody used the usual RGB rainbow palette it would be as follows for people with the color deficiencies,

library(cowplot)
library(colorspace)
library(colorblindr)

gcovid <- ggplot(covid_states) +

cvd_grid(gcovid)

A map of Australian regions showing covid cases indicated by a RGB rainbow color scale based legend and with labels on the regions.
A map of Australian regions showing covid cases indicated by a RGB rainbow color scale based legend and with labels on the regions.
A set of 4 maps of Australian regions showing covid cases showing how it looks like for different blind deficiencies when the RGB Rainbow palette is used.
A set of 4 maps of Australian regions showing covid cases showing how it looks like for different blind deficiencies when the RGB Rainbow palette is used.

Again, the HCL rainbow palette is FTW (For The Win)!! It provides a better meaning for many color deficiencies. The HCL palette Geyser gives the following,

scale_fill_gradientn(
colors = divergingx_hcl(11, “Geyser”, rev = TRUE),
values = NULL,
space = “Lab”,
na.value = “grey50”,
guide = “colourbar”,
aesthetics = “fill”)+

A map of Australian regions showing covid cases indicated by a HCL rainbow color scale based legend and with labels on the regions.
A map of Australian regions showing covid cases indicated by a HCL rainbow color scale based legend and with labels on the regions.

So, again if we demo how it looks like with the color deficiencies,

A set of 4 maps of Australian regions showing covid cases showing how it looks like for different blind deficiencies when the HCL Rainbow palette is used.

So what do you think? I know you got more to add here! Let me know :)

--

--

Madhuka De Silva
Analytics Vidhya

Inclusive Design & Technologies | PhD Researcher at Monash University