I am trying to use data obtained from Gapminder in R, and clean it to use it for my purposes. The specific dataset in question is Agriculture as a % of GDP. Since This is my first time using R for anything complex, I cleaned the data a bit in Excel before exporting it to R. Specifically, I changed the column name for the first column to "country" and saved the file as a CSV so that it could be imported easily via read.csv. The modified csv file can be found here. My aim here is to extract the data for the world's top 10 economies into a new dataset. With the CSV file in my working directory, I ran the following piece of code
library(dplyr)
library(ggplot2)
library(tidyr)
agri<-read.csv("Agriculture (p of GDP).csv")
agri<-gather(agri, "Year", "P of GDP", 2:52)
top_10_economies<-c("United States", "China", "Japan", "Germany", "United Kingdom", "India", "France", "Brazil", "Italy", "Canada")
agri_top_10<-agri%>%filter(country == top_10_economies)
I was expecting the data frame 'agri_top_10' to contain the data for each of the countries for all of the years, including the NAs. However, the resulting Dataframe only contained France, Italy, and United States. To be sure, the rest of the data is still present in the set. for example, running the following
agri2<-agri%>%filter(country == c("China", "India"))
gives the expected result. i.e. a data frame with 102 observations of 3 variables. but adding united states to the vector returns a frame with 0 observations. Why is this and how can I fix it?