0

I have a dataframe df with only one variable var with some related values.

df <- data.frame(var = c(rep('AUS',12), rep('NZ',12), rep('ENG',7), rep('SOC',12), 
                            rep('PAK',11), rep('SRI',17), rep('IND',15)))

df %>% count(var)
# # A tibble: 7 x 2
#      var     n
#   <fctr> <int>
# 1    AUS    12
# 2    ENG     7
# 3    IND    15
# 4     NZ    12
# 5    PAK    11
# 6    SOC    12
# 7    SRI    17

Based on some relations, some values should be recoded with a new value.

df %>% mutate(var = recode(var, 'AUS' = 'A', 'NZ' = 'A', 'ENG' = 'A', 
                           'SOC' = 'A', 'PAK' = 'B', 'SRI' = 'B')) %>% count(var)
# A tibble: 3 x 2
#      var     n
#   <fctr> <int>
# 1      A    43
# 2    IND    15
# 3      B    28

It can be seen that A and B recodes for 4 and 2 values respectively. I have also the expected solution in the question. However, is there any other efficient way to do this, instead of specifying the relations same number of times(4,2)??

Prradep
  • 5,506
  • 5
  • 43
  • 84
  • 1
    Something like `replace(var, var %in% to_change_A, 'A')` and respectively for B? – Sotos Oct 02 '17 at 14:42
  • 1
    Looks like you have a custom list to recode. Try with `case_when` i.e. `df %>% mutate(var = as.character(var), var =case_when(var %in% c('AUS', 'NZ', 'ENG', 'SOC') ~ 'A', var %in% c('PAK', 'SRI') ~ 'B', TRUE ~var))` – akrun Oct 02 '17 at 14:42
  • 1
    Alternatively, convert replace list to a `data.frame` and do a merge/join. – mt1022 Oct 02 '17 at 14:46
  • Did you mean to keep IND unchanged? Or should IND get a code too? – G5W Oct 02 '17 at 14:47
  • @G5W yes.. (atleast for now). – Prradep Oct 02 '17 at 14:48
  • Sotos, akrun, mt1022 thanks for your ideas. I will do that. – Prradep Oct 02 '17 at 14:49
  • In base R, probably easiest / most efficient is `levels(df$var)[levels(df$var) %in% c('AUS', 'NZ', 'ENG', 'SOC')] <- 'A'` and similar for B. – lmo Oct 02 '17 at 14:49
  • @Sotos Tried: `df %>% mutate(var = replace(var, var %in% c('AUS','NZ','ENG','SOC'), "A")) %>% count(var)` Error: `4 NA 43` Warning:`Warning message:In `[<-.factor`(`*tmp*`, list, value = "A") : invalid factor level, NA generated` – Prradep Oct 03 '17 at 09:44
  • Make `var` a character. You get the warning because It's a factor. So something like `df %>% mutate(var = as.character(var), var = replace(var, var %in% c('AUS','NZ','ENG','SOC'), "A")) %>% count(var)` should work – Sotos Oct 03 '17 at 09:47
  • @Sotos Thanks it worked. For recoding to `B`, I have to provide another replace command. Is there any way which could achieve both in a single command? – Prradep Oct 03 '17 at 09:49
  • Single way being that you will not have to call `replace` twice? Because separating `A`s from `B`s is a two line job anyway – Sotos Oct 03 '17 at 09:52

1 Answers1

0

One way to do this is to use a vector with named entries as a lookup table.

Codes = c(rep('A', 4), rep('B', 2), 'IND') 
names(Codes) = c('AUS', 'NZ', 'ENG', 'SOC', 'PAK', 'SRI', 'IND')
df$var = Codes[as.character(df$var)]
G5W
  • 36,531
  • 10
  • 47
  • 80