1

I am pretty new to R programming and I need help with plotting flags using ggflags library.

If you run the code below, it will create a single graph of density plots, box plots, and swarm plots. There are two purple dots in each 5-year period, which stand for China and India (the two most populous countries).

What I need is to replace these purple dots with flags of China and India, and with the country code attached to each flag.

I used the ggflags library but I could not figure out how to use it (geom_flag) with the points created by geom_quasirandom.

The two code lines mentioned are:

geom_quasirandom(aes(color = color_custom, alpha = alpha_custom, size = size_custom), width = 0.20)

This creates points of the swarm plot, including purple dots.

geom_flag(aes(country = CodeISO2C), size = 3)

This uses the country codes to make flags. However, when you uncomment this line, an error will occur:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘grobify’ for signature ‘"NULL"’

If it is not possible to use flags instead of purple dots, then please tell me how to set 2 colors for China and India and add a legend with a single function call: ComboGraph(country_data['isMostPopulous'],'none')

Thank you very much for your help.

The data source file can be downloaded from: https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx

Here is my full code:

# ****************************************************************
# Visualization of natural population rate in the world
# with density plots, box plots, and swarm plots in a single graph
# ****************************************************************


if (!require("dplyr")) {
  install.packages("dplyr")
}

if (!require("devtools")) {
  install.packages("devtools")
}

# flags in round shape
if (!require("ggflags")) {
  devtools::install_github("rensa/ggflags")
}

# flags in rectangular form 
# if (!require("ggflags")) {
#   devtools::install_github("ellisp/ggflags")
# }

if (!require("countrycode")) {
  install.packages("countrycode")
}

library(dplyr) # to use the pipe operator %>%
library(tidyverse)
library(readxl)
library(ggdist)
library(ggbeeswarm)
library(ggtext)
library(ggflags)
library(countrycode)

# import data from the 'ESTIMATES' sheet and skip the first 16 rows
country_data <- read_excel('WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx', 
                           sheet = 'ESTIMATES', skip = 16) %>%
  
  # only select rows with 'Country/Area' value in the 'Type' column
  dplyr::filter(Type == 'Country/Area') %>% 
  
  # select columns 1, 3, 6, 8 to 21
  select(c(1, 3, 6, 8:21)) %>% 
  # now the table has 17 columns
  
  # increase the number of rows and decrease the number of columns
  # to 2 columns: 'Period', and 'NaturalRate'
  pivot_longer(4:17, names_to = 'Period', values_to = 'NaturalRate') %>% 
  
  # mark countries by continent and max population
  # mutate: add new variables and preserve existing ones
  mutate(
    NaturalRate = as.numeric(NaturalRate),
    isAfrica = ifelse(Index >= 27 & Index <= 88, T, F),
    isAsia = ifelse(Index >= 90 & Index <= 146, T, F),
    isLatAmCar = ifelse(Index >= 149 & Index <= 188, T, F),
    isOceania = ifelse(Index >= 190 & Index <= 206, T, F),
    isEurope = ifelse(Index >= 210 & Index <= 252, T, F),
    isNorAm = ifelse(Index >= 254 & Index <= 255, T, F),
    isMostPopulous = ifelse(Index == 127 | Index == 119, T, F)
  )

# mark countries by max and min rate in each period
country_data <- country_data %>%
  group_by(Period) %>%
  mutate(
    maxPeriodRate = max(NaturalRate),
    minPeriodRate = min(NaturalRate)
  ) %>%
  mutate(isMaxMinPeriodRate = ifelse(NaturalRate == maxPeriodRate | NaturalRate == minPeriodRate, T, F)) %>%
  ungroup() # always ungroup after group_by to prevent future errors

# rename columns
names(country_data)[2] = 'Country'

# add a new column of two-letter country codes for certain cases 
# Note: The Channel Islands is not part of the UK, but ISO offers the code for it under "GB"
country_data <- country_data %>%
  mutate(CodeISO2C_NorAm = ifelse(isNorAm == TRUE, 
                                  tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
                                                      custom_match = c('Channel Islands' = 'GB'))),
                                  NA)) %>%
  mutate(CodeISO2C_Populous = ifelse(isMostPopulous == TRUE,
                                     tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
                                                         custom_match = c('Channel Islands' = 'GB'))),
                                     NA)) %>%
  mutate(CodeISO2C_MaxMin = ifelse(isMaxMinPeriodRate == TRUE,
                                   tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
                                                       custom_match = c('Channel Islands' = 'GB'))),
                                   NA))

# count the number of distinct periods
total_periods = nrow(unique(country_data[,'Period']))

# choose the range of colors for the density plots
column_colors <- colorRampPalette(c("#8ecae6", "#219ebc"))(total_periods)

# choose other colors
background_color = '#fffef7'
country_color = '#800080'
world_color = '#5DB6D3'

# create a function to make density plots, box plots, and swarm plots (strip plots) in the same graph
ComboGraphWithFlag <- function(boolean_column){
  # create temporarily 3 columns: size_custom, color_custom, alpha_custom to plot points of the selected group
  country_data <- mutate(country_data, size_custom = ifelse(boolean_column == TRUE, 0.8, 0.5))
  country_data <- mutate(country_data, color_custom = ifelse(boolean_column == TRUE, 'highlight', Period))
  country_data <- mutate(country_data, alpha_custom = ifelse(boolean_column == TRUE, 0.9, 0.5))
  
  if(colnames(boolean_column) == 'isNorAm') {
    region_group = 'Northern America'
    code_col = country_data$CodeISO2C_NorAm
  } else if(colnames(boolean_column) == 'isMostPopulous') {
    region_group = 'two most populous countries'
    code_col = country_data$CodeISO2C_Populous
  } else if(colnames(boolean_column) == 'isMaxMinPeriodRate') {
    region_group = 'countries of max and min rates'
    code_col = country_data$CodeISO2C_MaxMin
  }
  print(paste0('Generating the graph of ', region_group))
  
  ggplot(country_data, aes(x = Period, y = NaturalRate)) +
    geom_boxplot(fill = 'transparent', width =  0.4, color = 'orange',
                 outlier.shape = NA, alpha = 0.8, coef = 0) +
    geom_quasirandom(aes(color = color_custom, alpha = alpha_custom, size = size_custom), width = 0.20) +
    stat_halfeye(aes(fill = Period), color = 'orange', justification = -0.6, 
                 width = 0.4, .width = 0, alpha = 0.9) +
    geom_text(data = unique(country_data[, c('Period')]), 
              aes(y = 45, label = Period), color = column_colors, size = 2.7, family = 'Arial') +
    geom_flag(aes(country = code_col), size = 3, position = position_quasirandom()) +
    annotate('text', y = -16, x = 5.5, label = 'Cambodian genocide', 
             family = 'Arial', color = 'grey50', size = 3)+
    geom_curve(data = NULL, aes(x = 5.45, y = -17.5, xend = 5.9, yend = -21),
               arrow = arrow(length = unit(0.02, "npc")), color = 'grey60', size = 0.35) +
    scale_color_manual(values = c(column_colors, country_color)) +
    scale_alpha_identity() +
    scale_size_identity() +
    scale_fill_manual(values = column_colors) +
    labs(title = paste0("Natural population rate in <span style='color:", country_color, "'>", region_group, 
                        "</span> and <span style='color:", world_color, "'>the rest of the world </span>"), 
         subtitle = 'Natural population rate = crude birth rate - crude death rate, per 1000 population',
         caption = "Source: Rate of Natural Population Increase - Population Division, United Nations
                    \nhttps://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx", 
         y = 'Rate of natural population increase, per 1000 population') +
    theme_void() +
    theme(
      text = element_text('Arial', color = 'grey50'),
      plot.background = element_rect(fill = background_color, color = background_color),
      plot.margin = margin(0.8, 0.8, 0.8, 0.8, 'cm'),
      plot.title = element_markdown(face = 'bold', size = 16, margin = margin(b = 4)),
      plot.subtitle = element_text(colour = 'grey60', margin = margin(b = 7.5)),
      plot.caption = element_text(colour = 'grey40', size = 8),
      legend.position = 'none',
      axis.line.y = element_line(colour = 'grey50'),
      axis.title.y = element_text(angle = 90, margin = margin(r = 10), size = 10), 
      axis.text.y = element_text(color = 'grey70', size = 9, margin = margin(r = 5)), 
      panel.grid.major.y = element_line(colour = 'grey80', linetype = 'dotted')
    )
  ggsave(paste0('Population rate in ', region_group,'.png'), width = 10, height = 6)
  print('Graph making is completed.')
}

# make plots with flags for certain cases
ComboGraphWithFlag(country_data['isNorAm'])
ComboGraphWithFlag(country_data['isMostPopulous'])
ComboGraphWithFlag(country_data['isMaxMinPeriodRate'])

enter image description here

enter image description here

Thuc Dao
  • 13
  • 5

1 Answers1

2

ggbeeswarm comes with a few new "positions": these are specific objects that define the offset of the plotted points. You can pass this position to geom_flag

I have not used your example because it seems quite... involved. I hope the underlying concept becomes clear from the below example.

library(ggbeeswarm)
#> Loading required package: ggplot2
# devtools::install_github("jimjam-slam/ggflags")
library(ggflags)

ggplot(iris) +
  geom_flag(aes(x = Species, Sepal.Length, country = "fr"), 
            position = position_quasirandom())

Created on 2022-06-08 by the reprex package (v2.0.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thank you @tjebo. Your guide is perfect. Now I am able to add the flags of China and India to the graph (please see the photo I just added in the original post). However, there are some more things that I need your help: – Thuc Dao Jun 09 '22 at 07:11
  • 1. As you can see in my graph, the positions of some Chinese and Indian flags are not the same as the expected ones (purple dots). How can I fix it? 2. Would you please tell me how to display country codes above or below the flag circles. In the second graph (Population rate in countries of max and min rates), I need the country codes appear above the flags of max rates, and below the flags of min rates. 3. Please let me know how to display a legend of flags and their full country names. Thank you again for your valuable help. – Thuc Dao Jun 09 '22 at 07:20
  • You can copy my whole code to test. It takes only about 10 seconds to generate all 3 graphs. – Thuc Dao Jun 09 '22 at 07:25
  • 1
    @ThucDao this seems worth a follow up question. DOn't forget to link to this question here. Please kindly try to reduce the code to a **minimum**, ideally with data that doesn't need download and is created as a sample here. ([see here how to do that](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) or use one of the inbuilt data sets like I did. As you can see, I have made an example that is only a few lines, and you have many many. I am sure you can reproduce your problem with jsut a few lines too. Maybe use one of the inbuilt flags from the package. – tjebo Jun 09 '22 at 11:32