-1

I have a data frame in R with 3 columns in it and millions of rows:

> df
   col1 col2 col3
1   one  1.1    4
2   two  1.5    1
3 three  1.7    5
.    ..   ..   ..

I would like to do a calculation based on two of these columns. I would like to create a column that is basically something like:

if col1 == "one", then result = col2*.0.5, 
else if col1 == "two, then result = col2*0.6
else if ...

but short of doing a really big for loop over all millions of rows, I can't think of a more "R" way to do this without for loops. Any suggestions?

Thanks!

xtluo
  • 1,961
  • 18
  • 26
Thomas Moore
  • 941
  • 2
  • 11
  • 17
  • What's with the downvote? Based on the great answers below, wondering why this was downvoted? – Thomas Moore Jul 15 '17 at 21:12
  • Don't know, question is clear and decent to me. Only thing for future questions: providing some sample data based on the output of dput() is useful, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. This helps others while answering. Could you accept one of the answers to mark the question as being solved? Thanks! – Florian Jul 15 '17 at 21:25

3 Answers3

1

Small example of a possible solution. Not sure if this is the most efficient, but it does the trick.

df = data.frame(col1=c(1,1,2,2,3),col2=c(2,2,2,2,2))

df$col3=NA
df$col3 = ifelse(df$col1==1, df$col2*1.5, df$col3)
df$col3 = ifelse(df$col1==2, df$col2*2.5, df$col3)
df$col3 = ifelse(df$col1==3, df$col2*3.5, df$col3)
  1. If col1==1, then col3=col2*1.5
  2. If col1==2, then col3=col2*2.5
  3. If col1==3, then col3=col2*3.5

Hope this helps.

Florian
  • 24,425
  • 4
  • 49
  • 80
1

A vectorized way could be the following.

# make up some data
set.seed(525)
col1 <- sample(c("one", "two", "three"), 20, TRUE)
col2 <- runif(20)
col3 <- rnorm(20)
dat <- data.frame(col1, col2, col3, stringsAsFactors = FALSE)

# where to hold the result
result <- numeric(nrow(dat))

# first condition
inx <- dat$col1 == "one"
result[inx] <- dat[inx, "col2"]*0.5

# second condition
inx <- dat$col1 == "two"
result[inx] <- dat[inx, "col2"]*0.6

result
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

I would personally using a key-multiplier hash_map since nobody wants to write many of if-else statements, check this demo:

1. prepare your data:

> c1 <- c("one", "two", "three")
> c2 <- sample(10, 3)
> df <- data.frame(c1, c2)
> df$c1 <- as.character(df$c1)
> df
     c1 c2
1   one  4
2   two 10
3 three  5

2. define key-multiplier hash_map using setNames:

> key <- c("one", "two", "three")
> multiplier <- c(0.5, 0.6, 0.7)
> my.multiplier <- setNames(as.list(multiplier), key)
> my.multiplier
$one
[1] 0.5

$two
[1] 0.6

$three
[1] 0.7

3. just one line of code:

> df$c3 <- df$c2 * as.numeric(my.multiplier[df$c1])
> df
     c1 c2  c3
1   one  4 2.0 #4 * 0.5
2   two 10 6.0 #10 * 0.6
3 three  5 3.5 #5 * 0.7
xtluo
  • 1,961
  • 18
  • 26