0

I am using data from CSES (Comparative Study of Electoral Systems), to evaluate ideological distance between voters and parties.

I have used a case_when command provided here : Changing row names in a data_frame from letters to numbers in R

It worked very well for some variables, but now I'm trying to use the same code with similar variables (all of them numeric) and it yields the following error: Error in mutate_impl(.data, dots) :

Evaluation error: RHS of case 6 (ex_ideolparty_F) must be type double, not integer.

The data I am using is provided here: http://www.cses.org/datacenter/imd/data/cses_imd_r.zip

I have only made a few transformations in it before using case_when. This is the exact code I've run before the error:

library(dplyr)
library(descr)

load("/cses_imd.rdata")

##### DATA CLEANING/RENAMING #####

cses <- cses_imd  %>% 
  rename (election = IMD1004, country = IMD1006_NAM, type = IMD1009, age = IMD2001_1, gender = IMD2002,
          education = IMD2003, income = IMD2006, party =IMD3005_3, party_int = IMD3005_4, ideol_self = IMD3006,
          turnout = IMD5006_1, turnout_VAP = IMD5006_2, compulsory = IMD5007) %>%        
  select(election, country, type, age, gender, education, income, starts_with("IMD3002"), starts_with ("IMD3004"),
         party, party_int, ideol_self, starts_with("IMD3007"), turnout, turnout_VAP, compulsory,
         starts_with("IMD500"), starts_with("IMD501"))

### MORE RENAMING:

names (cses) <- gsub("IMD3002", "vote", names(cses)) 
names (cses) <- gsub("IMD3004", "prevote", names(cses)) 
names (cses) <- gsub("IMD3007", "ideolparty", names(cses)) 
names (cses) <- gsub("IMD5000", "numparty", names(cses)) 
names (cses) <- gsub("IMD5012", "ex_ideolparty", names(cses)) 
names (cses) <- gsub("IMD5013", "formula_house", names(cses)) 
names (cses) <- gsub("IMD5014", "formula_pres", names(cses)) 

cses$year <- as.numeric(substr(cses$election, 5, 8))


###### PERCEIVED IDEOLOGY OF THE PARTY VOTED #####

cses <- cses %>% mutate (
  ideol_voted_PR1 = case_when(
    numparty_A == vote_PR_1 ~ ideolparty_A,
    numparty_B == vote_PR_1 ~ ideolparty_B,
    numparty_C == vote_PR_1 ~ ideolparty_C,
    numparty_D == vote_PR_1 ~ ideolparty_D,
    numparty_E == vote_PR_1 ~ ideolparty_E,
    numparty_F == vote_PR_1 ~ ideolparty_F,
    numparty_G == vote_PR_1 ~ ideolparty_G,
    numparty_H == vote_PR_1 ~ ideolparty_H,
    numparty_I == vote_PR_1 ~ ideolparty_I,
    TRUE                    ~ vote_PR_1
  )
)

And here is where the problem happens:

##### PERCEIVED IDEOLOGY OF PARTY VOTED (EXPERT PLACEMENT):

cses <- cses %>% mutate (
  ideol_ex_PR1 = case_when(
    numparty_A == vote_PR_1 ~ ex_ideolparty_A,
    numparty_B == vote_PR_1 ~ ex_ideolparty_B,
    numparty_C == vote_PR_1 ~ ex_ideolparty_C,
    numparty_D == vote_PR_1 ~ ex_ideolparty_D,
    numparty_E == vote_PR_1 ~ ex_ideolparty_E,
    numparty_F == vote_PR_1 ~ ex_ideolparty_F,
    numparty_G == vote_PR_1 ~ ex_ideolparty_G,
    numparty_H == vote_PR_1 ~ ex_ideolparty_H,
    numparty_I == vote_PR_1 ~ ex_ideolparty_I,
    TRUE                    ~ vote_PR_1
  )
)

Why would that happen? I've checked all the columns that are used here, there is nothing different in case 6 "ex_ideolparty_F" from the other cases, not even with the cases in the first use of case_when, which has worked fine. All of these columns are numeric, not double.

1 Answers1

1

Similar to if_else, all returned values must be the same type, and in this sense numeric is not the same as integer.

If you look at your data, you have differences:

str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of  10 variables:
#  $ ex_ideolparty_A: num  6 6 6 6 6 6 6 6 6 6 ...
#  $ ex_ideolparty_B: num  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_C: num  7 7 7 7 7 7 7 7 7 7 ...
#  $ ex_ideolparty_D: num  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_E: num  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_F: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_G: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_H: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_I: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ vote_PR_1      : int  9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...

Depending on your data, if all are intended to be integers, then you can fix it with:

cses <- cses %>%
    mutate_at(vars(ex_ideolparty_A, ex_ideolparty_B, ex_ideolparty_C, ex_ideolparty_D, ex_ideolparty_E, ex_ideolparty_F, ex_ideolparty_G, ex_ideolparty_H, ex_ideolparty_I, vote_PR_1),
              as.integer)
str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of  10 variables:
#  $ ex_ideolparty_A: int  6 6 6 6 6 6 6 6 6 6 ...
#  $ ex_ideolparty_B: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_C: int  7 7 7 7 7 7 7 7 7 7 ...
#  $ ex_ideolparty_D: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_E: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_F: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_G: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_H: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_I: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ vote_PR_1      : int  9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...

And then your case_when will work without error.

(You might prefer as.numeric if there is even the chance that something is non-integral.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Can't believe I didn't see that. Thanks. It does have non-integral. So I got it a little tidier with `cses_new <- cses %>% mutate_at(vars(starts_with("ex_ideolparty"), vote_PR_1), as.numeric)` – Guilherme Pires Arbache Jul 06 '19 at 21:49
  • Yeah, my solution was brute-force, just to get things started. You have more sensible code there. – r2evans Jul 07 '19 at 02:34