0

Working in dataframe, I want to manipulate column values based on values in another column. Here is my reproducible code:

# four items
items <- c("coke", "tea", "shampoo","aspirin")

# scores for each item
score <- as.numeric(c(65,30,45,20))

# making a data frame of the two vectors created
df <- as.data.frame(cbind(items,score))

# score for coke is 65 and for tea it is 30.  I want to
# double score for tea OR coke if the score is below 50

ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

#the above return NULL values with warning

#the statement df$score[df$items %in% c("coke", "tea")] does pull coke and tea scores

df$score[df$items %in% c("coke", "tea")]

many thanks in advance for your help

s_scolary
  • 1,361
  • 10
  • 21
seakyourpeak
  • 531
  • 1
  • 6
  • 18
  • Welcome to StackOverflow. A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful – polka Dec 16 '15 at 18:38
  • #four itmes items <- c("coke", "tea", "shampoo","aspirin") # scores for each item score <- as.numeric(c(65,30,45,20)) #making a data frame of the two vectors created df <- as.data.frame(cbind(items,score)) #score for coke is 65 and for tea it is 30. I want to double score for tea OR #coke if the score is below 50 ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score) – seakyourpeak Dec 16 '15 at 18:40
  • 1
    Please edit the formatting of your question! – jogo Dec 16 '15 at 18:50

4 Answers4

1

This should do the trick for now:

items <- c("coke", "tea", "shampoo","aspirin")

# scores for each item
score <- as.numeric(c(65,30,45,20))

Try using data.frame instead of as.data.frame. Using the latter causes the values to be converted to factors

# making a data frame of the two vectors created
df <- data.frame(items, score)

df
    items score
1    coke    65
2     tea    30
3 shampoo    45
4 aspirin    20


# score for coke is 65 and for tea it is 30.  I want to
# double score for tea OR coke if the score is below 50

df$score[df$items %in% c("coke", "tea")] = ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20

This method doesn't work if you end up having duplicate entries for for items though.

# New data with an added entry for item = coke and score = 15:
items <- c("coke", "tea", "shampoo","aspirin","coke")
# scores for each item
score <- c(65,30,45,20,15)

# making a data frame of the two vectors created
df <- data.frame(items, score)


# using the method from above the last entry get converted to a value of 90
# instead of 30
df$score[df$items %in% c("coke", "tea")] = ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20
5    coke    90

So if you have any cases where you may have duplicate entries you will have to use this method

df <- data.frame(items, score)

df$score[df$items %in% c("coke", "tea") & df$score < 50] <- 2* df$score[df$items %in% c("coke", "tea") & df$score < 50]

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20
5    coke    30
s_scolary
  • 1,361
  • 10
  • 21
0

Your problem does not need an if statement. You can just combine two logical statements.

Logical 1: df$items %in% c("coke", "tea")

Logical 2: df$score < 50

By filtering the dataframe on these two logical statements you can multiply the score. and= &, or= |.

df$score[df$items %in% c("coke", "tea") | df$score < 50] <- 2* df$score[df$items %in% c("coke", "tea") | df$score < 50]

polka
  • 1,383
  • 2
  • 25
  • 40
  • hi Polka, the first logical evaluates as expected but the second one results in NULL value for all four. > df$score < 50 [1] NA NA NA NA probably thats why i am getting error. it is strage that df$score gives me all the vector but df$score < 50 results in four NA values – seakyourpeak Dec 16 '15 at 19:20
  • Thanks polka!!! it did work. Though score vector the way I created is a factor that caused df$score < 50 to result in NA NA NA NA. I got it working following your suggestion. THANK YOU! – seakyourpeak Dec 16 '15 at 21:49
  • I am one upvote away from being able to turn off ads. If my answer worked, would you please upvote my answer. – polka Dec 16 '15 at 21:50
0
items <- c("coke", "tea", "shampoo","aspirin")
score <- as.numeric(c(65,30,45,20))   

If you call data.frame() in the following way you avoid converting the score column to a factor.

df <- data.frame(items=items,score=score)

You don't need an if statement. You can simply extract the values you are interested in based on two logical statements:

df[df$score<50 & df$items %in% c("coke", "tea"), "score"] <- 2 * df[df$score<50 & df$items %in% c("coke", "tea"), "score"]

  • df$score<50 & df$items %in% c("coke", "tea") selects the rows that match both conditions, i.e. item either coke or tea and score less than 50.

  • "score" selects only the score column

  • The statement on the right of <- extracts the same value(s) and multiplies them by 2.

Tom
  • 86
  • 4
  • thanks Tom. Your solution df[df$score<50 & df$items %in% c("coke", "tea"), "score"] <- 2 * df[df$score<50 & df$items %in% c("coke", "tea"), "score"] does make sense but it gives me an error df[df$score<50 & df$items %in% c("coke", "tea"), "score"] results in [1] – seakyourpeak Dec 16 '15 at 19:03
  • Try creating again the df data.frame with the line that I suggested above (`df <- data.frame(items=items,score=score)`). – Tom Dec 16 '15 at 19:11
0

The syntax for your if statement is not quite correct, it looks like you're trying to invoke it in a way similar to how it is used in MS Excel. Unfortunately, it's not doing the trick.

I would suggest you take an intro to R course (many are available for free online), such as:

https://campus.datacamp.com/courses/free-introduction-to-r/chapter-1-intro-to-basics-1?ex=1

As for your problem, here is one solution (if i am understanding your problem correctly).

item <- c("coke", "tea", "shampoo", "aspirin")
score <- as.numeric(c(65, 30, 45, 20))

df <- data.frame(item, score)

for (i in 1:length(df$item)){
  if ((df$item[i] == "coke" | df$item[i] == "tea") & df$score[i] < 50) {
    df$score[i] <- df$score[i] * 2
  }
}

View(df)

You'll note that if you now view the updated dataframe ("df"), only the score for item "tea" has been doubled, since it meets both criteria (i.e. item = coke OR tea; AND it's associated score is below 50).

Hope this helps, and good luck.

Element89
  • 1
  • 2
  • thanks Element89. I am a new to R and will look into the link you sent. Your code reads perfectly fine but i get an error when i cut paste into r. here is the error that i get... Error in if ((df$item[i] == "coke" | df$item[i] == "tea") & df$score[i] < : missing value where TRUE/FALSE needed In addition: Warning message: In Ops.factor(df$score[i], 50) : < not meaningful for factors – seakyourpeak Dec 16 '15 at 19:17
  • 1
    Although this works, in general I would recommend avoiding the for loop in favour of vectorisation/subscripting. Circle 3 of the _R Inferno_ by Patrick Burns is a very interesting read (http://www.burns-stat.com). – Tom Dec 16 '15 at 19:20
  • Hey, looked into that! Thanks Tom! – polka Dec 16 '15 at 19:45
  • @seakyourpeak odd, I just re-copied and pasted the code into a blank script and i did not get the error you're describing? I think it's OK, it should do what you want, but as Tom said, there are more elegant ways of getting the same result. – Element89 Dec 16 '15 at 21:29