1

My Data frame looks like this

enter image description here

Now, I want to add a new column which assigns one (!) specific value to each country. That means, there is only one value for Australia, one for Canada etc. for every year.

It should look like this:

Year Country   R  Ineq Adv   NEW_COL
2018 Australia R1 Ineq1 1    x_Australia
2019 Australia R2 Ineq2 1    x_Australia
1972 Canada    R1 Ineq1 1    x_Canada
...

Is there a smart way to do this?

Appreciate any help!

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
nohomejerome
  • 141
  • 1
  • 5
  • 2
    `paste(Year, Country, sep = "_")`? – Gregor Thomas Dec 04 '20 at 14:43
  • Does this help [How to create a consecutive group number](https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-group-number) ? – markus Dec 04 '20 at 14:44
  • I guess I need several if conditions that check the country name and then assign one value to each country (constant over the years). – nohomejerome Dec 04 '20 at 14:46
  • Could you make your desired output clearer? `x_Australia` isn't super helpful, unless you literally want `paste("x", Country, sep = "_")`. What is `x`? Is it okay if it is the year? That would seem clear and useful. Or do you want it to be a number starting with 1? If two countries have the same year, should `x` be the same for them? Or do you want x = 1 for each country's first year? Or something else? – Gregor Thomas Dec 04 '20 at 15:29
  • Or do you just want a single number, `1` is Australia 2018, `2` is Australia 2019, `3` is Canada 1972, ...? If it's this, do you care about the order? Do you want consecutive values for the same country, increasing by year? – Gregor Thomas Dec 04 '20 at 15:30
  • And the wording in your question is confusing: *"one (!) specific value to each country"* make me think, well, one value per country. But your next sentence, *"one value for Australia, one for Canada etc. for every year"* makes me think one value per country per year. And do you want the vales to be integers? Or are string values okay? – Gregor Thomas Dec 04 '20 at 15:42
  • If what you're after is one integer ID for each country, then `df$NEW_COL = as.integer(factor(df$Country))` is a standard way. – Gregor Thomas Dec 04 '20 at 16:10

3 Answers3

0

you use merge.

x = data.frame(country = c("AUS","CAN","AUS","USA"),
           val1 = c(1:4))
y = data.frame(country = c("AUS","CAN","USA"),
               val2 = c("a","b","c"))
    merge(x,y)
    

      country val1 val2
1     AUS    1    a
2     AUS    3    a
3     CAN    2    b
4     USA    4    c

You just manually create the (probably significantly smaller!) reference table that then gets duplicated in the original table in the merge. As you can see, my 3 row table (with a,b,c) is correctly duplicated up to the original (4 row) table such that every AUS gets "a".

Carlos M.
  • 303
  • 2
  • 7
0

You may use mutate and case_when from the package dplyr:

library(dplyr)

data <- data.frame(country = rep(c("AUS", "CAN"), each = 2))

data <- mutate(data,
               newcol = case_when(
               country == "CAN" ~ 1,
               country == "AUS" ~ 2))
print(data)
Claudio
  • 1,229
  • 6
  • 11
0

You can use mutate and group_indices:

library(dplyr)

Sample data:

   sample.df <- data.frame(Year = sample(1971:2019, 10, replace = T),
                           Country = sample(c("AUS", "Can", "UK", "US"), 10, replace = T))

Create new variable called ID, and assign unique ID to each Country group:

sample.df <- sample.df %>%
  mutate(ID = group_indices(., Country))

If you want it to appear as x_Country, you can use paste (as commented):

sample.df <- sample.df %>%
  mutate(ID = paste(group_indices(., Country), Country, sep = "_"))
jayb
  • 555
  • 3
  • 15