0

I have a data frame users with a column id and country

id                  country
1                   France
2                   United States
3                   France

I want to add a new column salary which depends on the average salary for a given country.

My first thought was to create a config vector with (country, salary) like this :

salary_country <- c(
   "France"=45000,
   "United States"=50000,
   ...)

And then to create the column like this (using dplyr) :

tbl_df(users) %>% 
  mutate(salary = ifelse(country %in% names(salary_country), 
                         salary_country[country], 
                         0))

It runs like a charm. If the country does not exist in my salary_country vector, the salary is equal to 0 else it's equal to the given salary.

But, it is quite slow on a very large data frame and quite verbose.

Is there a better way to accomplish that ?

Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
Jerome Cance
  • 8,103
  • 12
  • 53
  • 106
  • 2
    Make `salary_country` a `data.frame`/`data.table` and `merge()` them with with `all = TRUE`, this will give you a `NA` where there is no average salary, which is IMO better than imputing `0`s. Edit: See http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – m-dz Apr 22 '16 at 08:47
  • no need for explicit 'merge' if OP uses `data.table` ... the `data.table` merge syntax coupled with the 'on' attribute is enough. See the vignette in case. – Colonel Beauvel Apr 22 '16 at 08:58

1 Answers1

1

You can use match:

salary_country[match(users$country, names(salary_country))]

Or go for data.table:

dt = data.table(salary=salary_country, country=names(salary_country))

dt[setDT(users), on='country']

#   salary       country id
#1:  45000        France  1
#2:  50000 United States  2
#3:  45000        France  3
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
  • thanks for the reply, merge is a good idea. I finally used left_join from dplyr package but this is the same idea of your answer. – Jerome Cance Apr 22 '16 at 12:11