0

I have a data frame and I want to add a column. For that purpose, I use dplyr::mutate. However, the values of the column I want to add depend of the value of other column. I use case_when() and the problem is solved, but if there are many cases the code is uncomfortable to write (I show an example below), so I want to know if there is another option (maybe a loop) to simplify this.

The following code works:

NewTable <- Table %>% 
  dplyr::mutate(ColumnB = case_when(
               ColumnA=="2000" ~ 0,
               ColumnA=="2001" ~ 4,
               ColumnA=="2002" ~ 8,
               ColumnA=="2003" ~ 12,
               ColumnA=="2004" ~ 16,
               ColumnA=="2005" ~ 20,
               ColumnA=="2006" ~ 24,
               ColumnA=="2007" ~ 28,
               ColumnA=="2008" ~ 32,
               ColumnA=="2009" ~ 36,
               ColumnA=="2010" ~ 40,
               ColumnA=="2011" ~ 44))

How can I improve it? Thank you very much.

JN_2605
  • 145
  • 6
  • Maybe a lookup table, [see here](https://stackoverflow.com/q/35636315/680068) , possible duplicate. – zx8754 Nov 06 '19 at 11:50
  • Since there is no easy, straight forward function/logic that would map a to 0, b to 4, ..., l to 345, you will have to specify their mapping somewhere. Of course you could write a lokk-up tabel, wrap the mapping in a function, etc but this would not reduce the code overall. or I am missing something about your desire? – mnist Nov 06 '19 at 11:51
  • @wusel Suppose the mapping is easier: a to 0, b to 2, c to 4, d to 6,... Can I simplify it? My problem is that I need to add more assignments and that supposes to add many rows to my code. – JN_2605 Nov 06 '19 at 12:03
  • Provide realistic data, if there is no logic to your groupings then there is only a manual way of recoding columns. – zx8754 Nov 06 '19 at 12:10
  • I have edited the question. That is the particular problem I want to solve. As I need to add more years, I wanted to simplify the code. I'm sorry if it was confusing, I just wanted to propose a general question – JN_2605 Nov 06 '19 at 12:31
  • 3
    This will map 2000 to 0, 2001 to 4, etc. `4 * (as.numeric(as.character(ColumnA)) - 2000)` . If `ColumnA` were already character as opposed to factor the `as.character` could be omitted. Please post reproducible inputs with your posts. See instructions at top of [tag:r]. – G. Grothendieck Nov 06 '19 at 12:56
  • If you don't mind telling us: Why this kind of recoding, why not keep as years? – zx8754 Nov 06 '19 at 13:04
  • 1
    are you counting trimesters ? – Cath Nov 06 '19 at 13:17

5 Answers5

1

Since you have changed your data, you may want to try this:

Table<-data.frame(A = c("2000", "2001", "2002", "2003", "2004"))
Table$B = (as.numeric(as.character(Table$A))-2000)*4 

If you do not have such a pattern, you could generate a list of values with matching A and B, and merge them:

match_list<-data.frame(A = c("2000", "2001", "2002", "2003", "2004"),
                       B = c(0, 4, 8, 16, 20))
merge(Table, match_list, by = "A")  # Table itself does not have B at this stage

This can only save some typing.

Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
1

This one is a solution using mapvalues from plyr package. I use it often.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# Create a simulated data frame

set.seed(9049)

Table <- data.frame(columnA = as.character(sample(2000:2011, 
                                                  size = 100, 
                                                  replace = TRUE)),
                    stringsAsFactors = FALSE)

# Extract unique levels from column A
colA_levels <- unique(Table$columnA)

# Create unique levels of column B to map to
colB_levels <- (as.numeric(colA_levels) - 2000) * 4

# Use `mapvalues` from `plyr` package
# 
NewTable <- Table %>% 
  mutate(columnB = plyr::mapvalues(columnA,
                                   from = colA_levels,
                                   to = colB_levels))

head(NewTable, 10)
#>    columnA columnB
#> 1     2008      32
#> 2     2011      44
#> 3     2007      28
#> 4     2011      44
#> 5     2001       4
#> 6     2010      40
#> 7     2000       0
#> 8     2007      28
#> 9     2000       0
#> 10    2002       8

Created on 2019-11-06 by the reprex package (v0.3.0)

0

Here's how I'd do it :

lkp <- setNames(c(0, 4, 8, 16, 20),
                c("2000", "2001", "2002", "2003", "2004"))
df <- data.frame(ColumnA = c("2004","2002"),stringsAsFactors = FALSE)
dplyr::mutate(df, ColumnB = lkp[ColumnA])
#>   ColumnA ColumnB
#> 1    2004      20
#> 2    2002       8

Created on 2019-11-06 by the reprex package (v0.3.0)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

Since you are writing the column B, I assume it doesn't exist in Table, so a join could do the trick.

library(dplyr)
NewTable <- left_join( Table,
tibble( A = c("2000", "2001", "2002", "2003", "2004"),
        B = seq(from = 0, to = 32, by = 4) )
-1

Considering OPs comment, you could wirte a function then (probs to this answer):

library(tidyverse)
letter2num <- function(x) {(utf8ToInt(x) - utf8ToInt("a")) * 2}
tibble(x = letters) %>% 
  rowwise() %>% 
  mutate(y = letter2num(x))
mnist
  • 6,571
  • 1
  • 18
  • 41
  • 2
    This answer is not useful, we need to know the logic for recoding. I doubt they have ABC letters in real data. – zx8754 Nov 06 '19 at 12:12
  • The question has been whether OP culd simplfy it if there is an esay mapping. answer: yes, there is and that is how it could work in the case specified. In a diferent setting, OP would have to adapt the code. – mnist Nov 06 '19 at 12:21