-4

I have a dataset called CSES (Comparative Study of Electoral Systems) where each row corresponds to an individual (one interview in a public opinion survey), from many countries, in many different years .

I need to create a variable which identifies the ideology of the party each person voted, as perceived by this same person.

However, the dataset identifies this perceived ideology of each party (as many other variables) by letters A, B, C, etc. Then, when it comes to identify WHICH PARTY each person voted for, it has a UNIQUE CODE NUMBER, that does not correspond to these letters across different years (i.e., the same party can have a different letter in different years – and, of course, it is never the same party across different countries, since each country has its own political parties).

Fictitious data to help clarify, reproduce and create a code:

Let’s say:

country = c(1,1,1,1,2,2,2,2,3,3,3,3)

year = c (2000,2000,2004,2004, 2002,2002,2004,2008,2000,2000,2000,2000)

party_A_number = c(11,11,12,12,21,21,22,23,31,31,31,31)

party_B_number = c(12, 12, 11, 11, 22,22,21,22,32,32,32,32)

party_C_number = c(13,13,13,13,23,23,23,21,33,33,33,33)

party_voted = c(12,13,12,11,21,24,23,22,31,32,33,31)

ideology_party_A <- floor(runif (12, min=1, max=10))

ideology_party_B <- floor(runif (12, min=1, max=10))

ideology_party_C <- floor(runif (12, min=1, max=10))

Let’s call the variable I want to create “ideology_voted”:

I need something like:

IF party_A_number == party_voted THEN ideology_voted = ideology_party_A

IF party_B_number == party_voted, THEN ideology_voted == ideology_party_B

IF party_C_number == party_voted, THEN ideology_voted == ideology_party_C

The real dataset has 9 letters for (up to) 9 main parties in each country , dozens of countries and election-years. Therefore, it would be great to have a code where I could iterate through letters A-I instead of “if voted party A, then …; if voted party B then….”

Nevertheless, I am having trouble even when I try longer, repetitive codes (one transformation for each party letter - which would give me 8 lines of code)

  • Hi, this has been asked and answered before, for example here: https://stackoverflow.com/questions/30339765/create-new-variable-based-on-other-columns-using-r please make sure you searched carefully before asking a new question. – snaut Jan 30 '19 at 11:56
  • Thanks for your reply, but I can't see how these two questions are the same thing. I am trying hard to figure out how to translate that code (as well as other ones I found) for my problem, but it does not seem to be the case. In the question you referred to, it is only needed to know whether someone in a column is a mother or a father (i.e., if it is in another column, regardless of the row). In my situation, I have to identify, for each column, what is the value of either of six other columns I should pick for my new value, based on a third variable. – Guilherme Pires Arbache Jan 31 '19 at 17:55
  • Without any datastructure it's hard to answer this question, please give one or two lines of data (could be made up, if you can't share acutal data) and a manually computed output for these lines so people can better understand what you want to compute. (This post helps a lot with this: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ) – snaut Feb 01 '19 at 09:46
  • Thanks. It was very confusing before, sorry. Now I believe it is clear. – Guilherme Pires Arbache Feb 02 '19 at 19:03

1 Answers1

0
library(tidyverse)

df <- tibble(
  country = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  year = c(2000, 2000, 2004, 2004,  2002, 2002, 2004, 2008, 2000, 2000, 2000, 2000),
  party_A_number = c(11, 11, 12, 12, 21, 21, 22, 23, 31, 31, 31, 31),
  party_B_number = c(12,  12,  11,  11,  22, 22, 21, 22, 32, 32, 32, 32),
  party_C_number = c(13, 13, 13, 13, 23, 23, 23, 21, 33, 33, 33, 33),
  party_voted = c(12, 13, 12, 11, 21, 24, 23, 22, 31, 32, 33, 31),
  ideology_party_A = floor(runif (12, min = 1, max = 10)), 
  ideology_party_B = floor(runif (12, min = 1, max = 10)),
  ideology_party_C = floor(runif (12, min = 1, max = 10))
)

> df
# A tibble: 12 x 9
   country  year party_A_number party_B_number party_C_number party_voted ideology_party_A ideology_party_B
     <dbl> <dbl>          <dbl>          <dbl>          <dbl>       <dbl>            <dbl>            <dbl>
 1       1  2000             11             12             13          12                9                3
 2       1  2000             11             12             13          13                2                6
 3       1  2004             12             11             13          12                3                8
 4       1  2004             12             11             13          11                7                8
 5       2  2002             21             22             23          21                2                7
 6       2  2002             21             22             23          24                8                2
 7       2  2004             22             21             23          23                1                7
 8       2  2008             23             22             21          22                7                7
 9       3  2000             31             32             33          31                4                3
10       3  2000             31             32             33          32                7                5
11       3  2000             31             32             33          33                1                6
12       3  2000             31             32             33          31                2                1
# ... with 1 more variable: ideology_party_C <dbl>

It seems you're after conditioning using case_when:

ideology_voted <- df %>% transmute(
  ideology_voted = case_when(
    party_A_number == party_voted ~ ideology_party_A,
    party_B_number == party_voted ~ ideology_party_B,
    party_C_number == party_voted ~ ideology_party_C,
    TRUE                          ~ party_voted
  )
)

> ideology_voted
# A tibble: 12 x 1
   ideology_voted
            <dbl>
 1              3
 2              7
 3              3
 4              8
 5              2
 6             24
 7              8
 8              7
 9              4
10              5
11              6
12              2

Note that the evaluation of case_when is lazy, so the first true condition is used (if it happens that more than one is actually true, say).

Werner
  • 14,324
  • 7
  • 55
  • 77