Unable to get expected observation using filter in R

Question

I am trying to use data obtained from Gapminder in R, and clean it to use it for my purposes. The specific dataset in question is Agriculture as a % of GDP. Since This is my first time using R for anything complex, I cleaned the data a bit in Excel before exporting it to R. Specifically, I changed the column name for the first column to "country" and saved the file as a CSV so that it could be imported easily via read.csv. The modified csv file can be found here. My aim here is to extract the data for the world's top 10 economies into a new dataset. With the CSV file in my working directory, I ran the following piece of code

library(dplyr)
library(ggplot2)
library(tidyr)
agri<-read.csv("Agriculture (p of GDP).csv")
agri<-gather(agri, "Year", "P of GDP", 2:52)
top_10_economies<-c("United States", "China", "Japan", "Germany", "United Kingdom", "India", "France", "Brazil", "Italy", "Canada")
agri_top_10<-agri%>%filter(country == top_10_economies)

I was expecting the data frame 'agri_top_10' to contain the data for each of the countries for all of the years, including the NAs. However, the resulting Dataframe only contained France, Italy, and United States. To be sure, the rest of the data is still present in the set. for example, running the following

agri2<-agri%>%filter(country == c("China", "India"))

gives the expected result. i.e. a data frame with 102 observations of 3 variables. but adding united states to the vector returns a frame with 0 observations. Why is this and how can I fix it?

score 2 · Accepted Answer · answered Jan 03 '18 at 05:54

2

I feel that maybe you should be using %in% here:

agri_top_10 <- agri %>% filter(country %in% top_10_economies)

answered Jan 03 '18 at 05:54

Tim Biegeleisen

502,043
27
286
360

Thanks! This worked perfectly! Though am still not sure why the previous command failed. as in why did it give me only 3 countries – Adi Jan 03 '18 at 06:07
1

@Adi I'm not sure why the previous command _worked_ when you compared to `c("China", "India")`. That it failed for other countries is what I would except actually. – Tim Biegeleisen Jan 03 '18 at 06:08

score 2 · Answer 2 · answered Jan 03 '18 at 05:59

2

If you want to use base R instead of dplyr the code is fairly simple.

Top10countries = agri[agri$country %in% top_10_economies,]

answered Jan 03 '18 at 05:59

Dror Bogin

453
4
13

Unable to get expected observation using filter in R

2 Answers2