3

I've been learning how to build maps for work showing covid-19 infection data. In addition to a national map, I am producing regional maps for the northeast, south, west, and midwest. The code is identical, I just use different filterings on the national data frame. I produce one map for each day of data and use gganimate to make frames and finally gifski to make an animated gif. The basic code for the national map is:

p <- pop_counties_cov %>%
  ggplot() +
  geom_sf(mapping = aes(fill = infRate, geometry=geometry), color = NA) +
  geom_sf(data = states_sf, fill = NA, color = "black", size = 0.25) +
  coord_sf(datum = NA) +   
  scale_fill_gradient(name = "% Population Infected", trans = "log", low='green', high='red',
                      na.value = "white",
                      breaks=c(0, round(max(pop_counties_cov$infRate),3))) +
  geom_point(data=AFMCbases, aes(x=longitude.1, y=latitude.1,size=personnel), color = "hotpink") +
  #geom_label_repel(data=AFMCbases, aes(x=longitude.1, y=latitude.1, label=Base)) +
  theme_bw() + 
  labs(size='AFMC \nMil + Civ') +
  theme(legend.position="bottom", 
        panel.border = element_blank(),
        axis.title.x=element_blank(), 
        axis.title.y=element_blank())

As an example, the final frame it produces is

enter image description here

This code on the other hand:

p <- mw_pop_counties_cov %>%
  ggplot() +
  geom_sf(mapping = aes(fill = infRate, geometry=geometry), color = NA) +
  geom_sf(data = mw_states_sf, fill = NA, color = "black", size = 0.25) +
  coord_sf(datum = NA) +   
  scale_fill_gradient(name = "% Population Infected", trans = "log", low='green', high='red',
                      na.value = "white",
                      breaks=c(0, round(max(mw_pop_counties_cov$infRate),3))) +
  geom_point(data=mwBases, aes(x=longitude.1, y=latitude.1,size=personnel), color = "hotpink") +
  #geom_label_repel(data=AFMCbases, aes(x=longitude.1, y=latitude.1, label=Base)) +
  theme_bw() + 
  labs(size='AFMC \nMil + Civ') +
  theme(legend.position="bottom", 
        panel.border = element_blank(),
        axis.title.x=element_blank(), 
        axis.title.y=element_blank())

which is the same except that the data frames have been filtered down to only the midwest states, produces

enter image description here

Note the appearance of the color scale.

Is there some typo I'm not seeing because I've been staring at this too long?
My script produces 5 animations, 1 each for the national map, then the 4 census regions (northeast, midwest, south and west). The color scale appears on the west and midwest maps, but not the other three. This is despite the fact I basically just cut and pasted and then changed the dataframe names.

What am I doing wrong? I WANT the color scale to appear on ALL maps.

Community
  • 1
  • 1
jerH
  • 1,085
  • 1
  • 12
  • 30
  • 1
    Have you try to pass `show.legend = FALSE` in `geom_sf` ? Can you provide a reproducible example of your dataset ? – dc37 Mar 30 '20 at 03:14
  • Hello @jerH, isn't it because of the proportions? I mean, you compare a country vs a region and you use a log transformation. – Manu Mar 30 '20 at 03:15
  • @dc37 I'm not exactly sure how to provide a reproducible example...the covid data I read from a URL, but the county population data is from a rather large .csv I only have locally. If there's a good way to do that please let me know. Also, I'll edit the OP to make it clear that I WANT the color scale to appear..I don't know why it isn't in some cases. – jerH Mar 30 '20 at 03:39
  • 1
    I see. Sorry for the mis-understanding. – dc37 Mar 30 '20 at 03:41
  • 1
    @Manu In each case, I'm color-coding each county by the percent of it's population that has tested positive, so the range is [0,1] regardless. But for the scale I use the max infection rate in the respective data frame. For example, in the midwest map it's `breaks=c(0, round(max(mw_pop_counties_cov$infRate),3)))` + so I take the max of the midwest counties rates, while in the national map it's `breaks=c(0, round(max(pop_counties_cov$infRate),3)))` so it's the max of all counties. The log transform is just cuz the data is very skewed... – jerH Mar 30 '20 at 03:41
  • This looks like a sizing issue to me - you have 4 pink dots on the first map, and only 3 on the second. Maybe that does not leave enough room for the color scale at the given pixel size? Try changing legend size - see https://stackoverflow.com/questions/15059093/ggplot2-adjust-the-symbol-size-in-legends for how to do that. – user12728748 Mar 30 '20 at 04:12
  • @user12728748 you may be on to something...I changed to rounding to 2 decimal places instead of 3 and the scale re-appeared on some of the charts...remained missing on a couple though. The weird thing though is that I used to not round at all and the scale appeared on all of them, just out to like 5 or 6 decimal places. Would have though rounding down to 3 wouldn't have been an issue. I'd actually like to format them as %'s (so the midwest map above would have a scale from 0 to 0.132% instead of just 0.132 but don't know how to do that either! – jerH Mar 30 '20 at 05:26
  • Do you need a log transform for the proportion infected? And I wonder if you should make the maximum the same across all maps for better comparison. – Edward Mar 30 '20 at 05:26
  • @Edward you're probably right...personally I'm not a huge fan of this % infected map anyway, but somebody above me in the food chain wants it. I'll give it a run without and see how it looks. I actually want to wrap this up and move onto a) trying to project #cases forward and b) producing a version of this [this](https://i0.wp.com/kieranhealy.org/files/misc/cov_case_sm.png?w=456&ssl=1) map based around the locations highlighted by the pink dots... – jerH Mar 30 '20 at 05:30
  • 1
    I don't mean rounding of values, but the physical horizontal space on your map. If you changed the gradient name from `% Population Infected` to `% PI` just to try if that is the issue, you might fit the scale on the first plot as well. – user12728748 Mar 30 '20 at 05:41
  • 1
    @jerH u still struggling? If so, please make the code repdocucible - meaning link to the url where you found the data, and also the steps to filter to the different data frames. Include all packages loaded. To make a reproducible example, best is to use the `reprex` package (and use RStudio). Install it and it will be fully integrated in RStudio. Mark all your code, click the 'addins' button and 'render reprex' – tjebo Mar 30 '20 at 09:18
  • 2
    @Tjebo Thanks for the info on `reprex` I'll check it out. Big issue for me was that I pretty much put the county population data together by hand because I had to manually fix a bunch of things to ensure consistency between the sources..like it's "City of New York" from one source and "New York City" in another. As for the original problem, a simple fix turned out to be adding a newline character in the scale label. Once I made it `scale_fill_gradient(name = "% Population \nInfected"...` it worked just fine – jerH Mar 30 '20 at 12:13
  • @Edward tried it without the log transform and you basically wind up with an almost solid green map... – jerH Mar 30 '20 at 12:14
  • 1
    would suggest to add this as answer, for future generations, because you received some upvotes, which shows that some people found the problem interesting - it may therefore help future researches – tjebo Mar 30 '20 at 12:26

1 Answers1

1

Turns out the issue does appear to be one of horizontal space on the output plot. By simply changing the scale caption from

scale_fill_gradient(name = "% Population Infected",...

to

scale_fill_gradient(name = "% Population \nInfected",

Note the newline character \n

The output map now has the color scale I was looking for

enter image description here

jerH
  • 1,085
  • 1
  • 12
  • 30