0

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')

In the 1st dataset, there are countries divided into continents.

In the second data set, there is country and population information.

How can I combine population information in data set 2 according to the continental information in data set 1.

thank you. The problem is that in the 1st dataset, countries are written on a continental basis. Countries and their populations in the second dataset. Do I need the population information of the continents? eg europe = 400 million, asia = 2.4 billion

  • Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) –  May 04 '20 at 13:51

1 Answers1

1

Using the dplyr package all you have to do is join by a common variable, in this case country name. Since in one data frame the name is called countryName and in the other one country_name, we just have to specify that they in fact belong to the same variable.

library(dplyr)
library(stringr)

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>% 
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE))

# A tibble: 5 x 2
  region   population
* <chr>         <dbl>
1 Africa   1304908713
2 Americas 1019607512
3 Asia     4592311527
4 Europe    738083720
5 Oceania    40731992
Sergio Romero
  • 368
  • 1
  • 8
  • thank you. The problem is that in the 1st dataset, countries are written on a continental basis. Countries and their populations in the second dataset. Do I need the population information of the continents? eg europe = 400 million, asia = 2.4 billion – vahit ünsal May 04 '20 at 13:53
  • Ah I'll see. Let me fix my answer. – Sergio Romero May 04 '20 at 13:55
  • I've added a correction to my answer, let me know if that's what you wanted to do. – Sergio Romero May 04 '20 at 14:07
  • yeah but why doesn't it work for me ... gives this error. Error in df%>% left_join (df8, by = c (countryName = "country_name"))%>%: Function "%>%" not found – vahit ünsal May 04 '20 at 14:20
  • You need to install the dplyr package ... install.packages("dplyr") – Sergio Romero May 04 '20 at 14:21
  • Error in str_remove_all (population, ","): The function "str_remove_all" could not be found. gave this error – vahit ünsal May 04 '20 at 14:33
  • Ah I apologize. I used the stringr package as well because the populations were poorly formatted. Basically, just use install.packages("stringr") too. Those are all the packages you need, I promise! ... Or if you want, you can replace the str_remove_all function for the gsub(",", "", population) function. – Sergio Romero May 04 '20 at 14:34