1

I am working with a dataset that has a column with country codes named "ccode":

votes tibble

When I create another column to create country names with the name "country", I use the function "countrycode"from the countrycode package that I downloaded form CRAN and have the following results:

votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name"))

and the following warning message:

Warning message:
In countrycode(ccode, "cown", "country.name") :
  Some values were not matched unambiguously: 260, 816

country votes tibble

Since these country codes cannot be assigned a country name, I filtered them out of the dataframe:

> table(is.na(votes_processed$country))

 FALSE   TRUE 
350844   2703 
> votes_processed <- filter(votes_processed,!is.na(country))
> table(is.na(votes_processed$country))

 FALSE 
350844 

Afterwards I run the following commands to create another tibble that gives me grouped information regarding the total votes and the proportion of "yes" (1-yes) votes by year and country:

# Group by year and country: by_year_country
by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

by_year_country tibble

Then I run the following command to nest the data by country and the console sends the following warning and erases my country column:

> nested <- by_year_country %>%
+   nest(-country)
Warning message:
Unknown or uninitialised column: 'country'. 

nested tibble

> nested$country
NULL
Warning messages:
1: Unknown or uninitialised column: 'country'. 
2: Unknown or uninitialised column: 'country'. 

Could someone explain me what is happening with this "country" column and why R is not recognizing it and what can I do about it?

I am a beginner in this platform. I got a comment asking for a sample of the data, I paste it here:

rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

sample_data_votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)

Thank you very much for your time and advice.

  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data do not make it easy to actually run and test the code. Also make sure to specify where this `countrycode` function comes from. – MrFlick Jul 23 '18 at 16:39
  • @MrFlick thank you very much for you comment. I edited the question to add the sample data and explain that I got the countrycode package from CRAN. – Juan Carlos Gonzalez Ibarguen Jul 23 '18 at 17:37

2 Answers2

3

by_year_country is grouped so you need first to ungrouped then do nesting

library(tidyverse)
by_year_country %>% ungroup() %>% 
                     nest(-country) %>% head(n=2)

# A tibble: 2 x 2
  country   data            
 <chr>     <list>          
1 Guatemala <tibble [2 x 3]>
2 Haiti     <tibble [2 x 3]>
A. Suliman
  • 12,923
  • 5
  • 24
  • 37
1

Looks like you need to remove the -country part from your call to nest

library(dplyr)
library(tidyr)
library(countrycode)
rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)
votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name")) %>% 
  filter(!is.na(country))

by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

nested <- by_year_country %>%
  nest()

Having -country told nest to use everything but country. By default nest uses all columns except grouping columns. by_year_country is a tibble that is grouped by year. The summarize call removes one level of grouping so it is no longer grouped by country but is still grouped by year. If you want to remove the grouping use ungroup()

see24
  • 1,097
  • 10
  • 21