Column manipulation in R - matching correct names

Question

I have a data.frame composed of multiple columns and thousands of rows. Below I attempt to display its (head):

|year           |state_name|idealPoint|   vote_no|  vote_yes| 
|:--------------|---------:|---------:|---------:|---------:|
|1971           |   China  |  -25.0000|   31.0000|   45.4209|
|1972           |   China  |  -26.2550|   38.2974|   45.4209|
|1973           |   China  |   28.2550|   35.2974|   45.4209|
|1994           |   Czech  |   27.2550|   34.2974|   45.4209|

As you can see. Not all countries [there are 196 of them] joined voting at the UN in the same year.

What I want to do is to create a new column in my data.frame (votes) that consists of the absolute difference between ChinaIdealpoints to Czech Ideal points (for given year...). I know how to create the new column with dplyr but how do I multiply correct countries from the list of 196 countries? (the difference between the year of joining can be then deleted manually I think).

The final Output should be new data.frame (or new columns in votes) looking like this: China ideal point in 1994 was, for instance, 2.2550

|year           |state_name|idealPoint|Abs.Difference China_Czech
|:--------------|---------:|---------:|-------------------------:|
|1971           |   China  |  -25.0000|                   NA     |
|1972           |   China  |  -26.2550|                   NA     |
|1973           |   China  |   28.2550|                   NA     |
|1994           |   Czech  |   27.2550|                  25.0000 |

Hi and welcome to StackOverflow! For quesitons related to R, please refer to the comprehensive [How to make a great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Oliver Baumann, Oct 21 '18 at 15:11
1. Do you ONLY have China and Czech in the column "state_name"? 2. If no, where are the other countries (you said there is a list of 196)? 3. What do the variables `vote_no` and `vote_yes` have to do with all this? — Roman, Oct 21 '18 at 15:16
My misstake, the variables vote_no and vote_yes are not needed in there. The column state_name goes down and list each country with its given data [Year - Ideal Point value etc]. — Kamil Liskutin, Oct 21 '18 at 15:19

score 0 · Answer 1 · answered Oct 21 '18 at 15:35

Does this perhaps solve your problem?

library(tibble)
library(dplyr)

a <- tribble(
  ~year, ~ctry, ~vote,
  1994, "China", 5,
  1995, "China", 100,
  1996, "China", 600,
  1997, "China", 45,
  1998, "China", 9,
  1994, "Czech_Republic", 1,
  1995, "Czech_Republic", 5,
  1996, "Czech_Republic", 100,
  1997, "Czech_Republic", 40,
  1998, "Czech_Republic", 6,
)

a %>%
  group_by(year) %>%
  mutate(foo = abs(lag(lead(vote) - vote)))

Output:

# A tibble: 10 x 4
# Groups:   year [5]
    year ctry            vote   foo
   <dbl> <chr>          <dbl> <dbl>
 1  1994 China              5    NA
 2  1995 China            100    NA
 3  1996 China            600    NA
 4  1997 China             45    NA
 5  1998 China              9    NA
 6  1994 Czech_Republic     1     4
 7  1995 Czech_Republic     5    95
 8  1996 Czech_Republic   100   500
 9  1997 Czech_Republic    40     5
10  1998 Czech_Republic     6     3

You'll have to filter down the data to fit your needs, e.g. by country.

I guess it will solve the problem somehow, I am just new to it and had no idea how to select the data from the column. Well, I went ahead and did it manually in excel sheets from which I created new data.table. Too long for sure. Will try to tinker with it once I am rested xD — Kamil Liskutin, Oct 21 '18 at 20:53

score 0 · Accepted Answer · answered Oct 22 '18 at 05:07

Codes:

df1 <- data.frame(year = c(1994,1995,1996,1997,1994,1995,1996,1997),
                  state_name = c("China","China","China","China","Czech_Republic","Czech_Republic","Czech_Republic","Czech_Republic"),
                  idealpoints = c(-25.0000,-26.2550,28.2550,27.2550,-27.0000,-28.2550,29.2550,22.2550),
                  vote_no = c(31.0000,38.2974,35.2974,34.2974,33.0000,36.2974,37.2974,38.2974),
                  vote_yes = c(45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209))

china_df <- df1[df1$state_name == "China",]
czech_df <- df1[df1$state_name == "Czech_Republic",]
china_czech_merge <- merge(china_df,czech_df,by = "year")

china_czech_merge$Abs_diff <- abs(china_czech_merge$idealpoints.x - china_czech_merge$idealpoints.y)

Output:

  year state_name.x idealpoints.x vote_no.x vote_yes.x   state_name.y idealpoints.y vote_no.y vote_yes.y Abs_diff
1 1994        China       -25.000   31.0000    45.4209 Czech_Republic       -27.000   33.0000    45.4209        2
2 1995        China       -26.255   38.2974    45.4209 Czech_Republic       -28.255   36.2974    45.4209        2
3 1996        China        28.255   35.2974    45.4209 Czech_Republic        29.255   37.2974    45.4209        1
4 1997        China        27.255   34.2974    45.4209 Czech_Republic        22.255   38.2974    45.4209        5

I think this will work for you.

Thanks

Amazing, it works and it works out of the bag :) Automatically takes only the year that both countries have data for that Ideal points estimates. Amazing. All I did was to wrap abs equation into *-1 for it to be in negative but its a cosmetic thing for my work. Thank you. — Kamil Liskutin, Oct 22 '18 at 19:02

Column manipulation in R - matching correct names

2 Answers2