31

ETA: the point of the below, by the way, is to not have to iterate through my entire set of column vectors, just in case that was a proposed solution (just do what is known to work once at a time).


There's plenty of examples of replacing values in a single vector of a data frame in R with some other value.

And also how to replace all values of NA with something else:

What I'm looking for is analogous to the last question, but basically trying to replace one value with another. I'm having trouble generating a data frame of logical values mapped to my actual data frame for cases where multiple columns meet a criteria, or simply trying to do the actions from the first two questions on more than one column.

An example:

data <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep(1:9), var2 = rep(3:5, each = 3))

data
  name var1 var2
1    a    1    3
2    a    2    3
3    a    3    3
4    b    4    4
5    b    5    4
6    b    6    4
7    c    7    5
8    c    8    5
9    c    9    5

And say I want all of the values of 4 in var1 and var2 to be 10.

I'm sure this is elementary and I'm just not thinking through it properly. I have been trying things like:

data[data[, 2:3] == 4, ]

That doesn't work, but if I do the same with data[, 2] instead of data[, 2:3], things work fine. It seems that logical test (like is.na()) work on multiple rows/columns, but that numerical comparisons aren't playing as nicely?

Thanks for any suggestions!

Community
  • 1
  • 1
Hendy
  • 10,182
  • 15
  • 65
  • 71

4 Answers4

73

you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..

data[ is.na( data ) ] <- 10

you can also replace all 4s with 10s.

data[ data == 4 ] <- 10

at least i think that's what you're after?

and let's say you wanted to ignore the first row (since it's all letters)

# identify which columns contain the values you might want to replace
data[ , 2:3 ]

# subset it with extended bracketing..
data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
# ..those were the values you're going to replace

# now overwrite 'em with tens
data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10

# look at the final data
data
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • 1
    I flipping swear I tried this and it wasn't working for me before. I hope to get to the point where I don't kick myself everytime I post to SO... By the way -- you're the 1min R video guy, aren't you!? Those rock. – Hendy Feb 06 '13 at 21:57
5

Basically data[, 2:3]==4 gave you the index for data[,2:3] instead of data:

R > data[, 2:3] ==4
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

So you may try this:

R > data[,2:3][data[, 2:3] ==4]
[1] 4 4 4 4
liuminzhao
  • 2,385
  • 17
  • 28
  • Thanks for this; also works. I just think the one from Anthony is a tad simpler. Big thanks for explaining *why* mine wasn't working though; after playing around some more, I see what you mean: me trying to apply values to data based on a comparison that was *also* subsetting makes a lot more sense. – Hendy Feb 06 '13 at 21:59
2

Just to provide a different answer, I thought I would write up a vector-math approach:

You can create a transformation matrix (really a data frame here, but will work the same), using a the vectorized 'ifelse' statement and multiply the transformation matrix and your original data, like so:

df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
   .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
    return(.data_Frame)
}

To replace all values 4 with 10 in the data frame 'data' in columns 2 through 3, you would use the function like so:

# Either of these will work.  I'm just showing options.
df.Rep(data, 2:3, 4, 10)
df.Rep(data, c("var1","var2"), 4, 10)

#   name var1 var2
# 1    a    1    3
# 2    a    2    3
# 3    a    3    3
# 4    b   10   10
# 5    b    5   10
# 6    b    6   10
# 7    c    7    5
# 8    c    8    5
# 9    c    9    5
Dinre
  • 4,196
  • 17
  • 26
1

Just for continuity

    data[,2:3][ data[,2:3] == 4 ] <- 10

But it looks ugly, So do it in 2 steps is better.

agstudy
  • 119,832
  • 17
  • 199
  • 261