Replace a value in a data frame based on a conditional (`if`) statement

Question

In the R data frame coded for below, I would like to replace all of the times that B appears with b.

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12])
colnames(junk) <- c("nm", "val")

this provides:

   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   f
7   C   g
8   D   h
9   A   i
10  B   j
11  C   k
12  D   l

My initial attempt was to use a for and if statements like so:

for(i in junk$nm) if(i %in% "B") junk$nm <- "b"

but as I am sure you can see, this replaces ALL of the values of junk$nm with b. I can see why this is doing this but I can't seem to get it to replace only those cases of junk$nm where the original value was B.

NOTE: I managed to solve the problem with gsub but in the interest of learning R I still would like to know how to get my original approach to work (if it is possible)

you might want to add stringsAsFactors = FALSE to the original data.frame construction. — jimmyb, Apr 28 '11 at 21:00
@jimmyb Why? Factors are useful, and necessary if one is modelling with most of R's modelling code. The correct way of dealing with this is to acknowledge that the data are a factor. If you don't want/need this conversion then you can do as you say. If you do want the factor, then there are easy ways to do the manipulation @Kenny wants to perform. — Gavin Simpson, Apr 28 '11 at 21:18
So factors used to be more popular because of performance, however, now that strings are immutable and hashed the value of factors is less obvious, as most of the base R functionality will just convert them (albeit with warnings) directly. I think factors result in a significant number of bugs that I find in peoples R code. — jimmyb, Apr 29 '11 at 00:15

diliop · Accepted Answer · 2011-04-28T20:50:01.677

254

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)

edited Apr 28 '11 at 20:50

answered Apr 28 '11 at 20:11

diliop

9,241
5
28
23

4

as.character() makes life so much easier when working with factors. +1 – Brandon Bertelsen Apr 28 '11 at 22:38
6

what if you have multiple columns? – geodex Apr 19 '15 at 21:33
1

@diliop :Thanks for that : what if I want to change if junk$nm if it takes values "B", "Y", "Z", ... ? – simo Mar 30 '21 at 10:18

score 48 · Answer 2 · edited Nov 11 '18 at 22:10

48

another useful way to replace values

library(plyr)
junk$nm <- revalue(junk$nm, c("B"="b"))

edited Nov 11 '18 at 22:10

qwr

9,525
5
58
102

answered Dec 14 '13 at 16:27

Oriol Prat

1,017
1
11
19

Marek · Answer 3 · 2011-04-28T20:37:16.807

30

Short answer is:

junk$nm[junk$nm %in% "B"] <- "b"

Take a look at Index vectors in R Introduction (if you don't read it yet).

EDIT. As noticed in comments this solution works for character vectors so fail on your data.

For factor best way is to change level:

levels(junk$nm)[levels(junk$nm)=="B"] <- "b"

edited Apr 28 '11 at 20:37

answered Apr 28 '11 at 20:03

Marek

49,472
15
99
121

Short addition: The usage of %in% only really helps if you have a set on the right side, as `c("B","C")`. Doing `junk$nm[junk$nm == "B"]` is the better way. – Thilo Apr 28 '11 at 20:14
1

Oh, another, important addition: Doing it like this requires first adding the factor level `b` to the factor nm. diliop's version is in fact the better one if you want to work with characters, not factors. (Always think about the type your variables have first!) – Thilo Apr 28 '11 at 20:18
that doesn't work on the data as created by @Kenny because the data are factors. Did you forget a step or do you have the global setting to stop converting characters to factors? – Gavin Simpson Apr 28 '11 at 20:27
4

@Thilo One of the important differences between `%in%` and `==` is `NA` handling: `c(1,2,NA)==1` gives `TRUE, FALSE, NA` but `c(1,2,NA) %in% 1` gives `TRUE, FALSE, FALSE`. And yes I forgot to check if this work :/ – Marek Apr 28 '11 at 20:41
This helped me out as I was specifically looking for how to do this for all values in a character vector. Thanks. – The_Tams Nov 05 '21 at 16:22

Gavin Simpson · Answer 4 · 2011-04-29T08:39:08.673

21

As the data you show are factors, it complicates things a little bit. @diliop's Answer approaches the problem by converting to nm to a character variable. To get back to the original factors a further step is required.

An alternative is to manipulate the levels of the factor in place.

> lev <- with(junk, levels(nm))
> lev[lev == "B"] <- "b"
> junk2 <- within(junk, levels(nm) <- lev)
> junk2
   nm val
1   A   a
2   b   b
3   C   c
4   D   d
5   A   e
6   b   f
7   C   g
8   D   h
9   A   i
10  b   j
11  C   k
12  D   l

That is quite simple and I often forget that there is a replacement function for levels().

Edit: As noted by @Seth in the comments, this can be done in a one-liner, without loss of clarity:

within(junk, levels(nm)[levels(nm) == "B"] <- "b")

edited Apr 29 '11 at 08:39

answered Apr 28 '11 at 20:36

Gavin Simpson

170,508
25
396
453

6

Nice. I didn't know about the replacement function for `levels()`. How about the one liner `junk <- within(junk, levels(nm)[levels(nm)=="B"] <- "b")`? – Apr 28 '11 at 22:26
2

@Marek **slaps head** Just goes to show that one shouldn't respond to comments on SO when it is well past ones bedtime. Lets try that again... – Gavin Simpson Apr 29 '11 at 08:36
@Seth Indeed - nice. Not sure why I separated the steps? Perhaps for exposition... – Gavin Simpson Apr 29 '11 at 08:37

score 12 · Answer 5 · edited Jan 07 '12 at 13:31

12

The easiest way to do this in one command is to use which command and also need not to change the factors into character by doing this:

junk$nm[which(junk$nm=="B")]<-"b"

edited Jan 07 '12 at 13:31

answered Jan 07 '12 at 13:26

user1021713

2,133
8
27
40

score 5 · Answer 6 · answered Apr 28 '11 at 20:18

5

You have created a factor variable in nm so you either need to avoid doing so or add an additional level to the factor attributes. You should also avoid using <- in the arguments to data.frame()

Option 1:

junk <- data.frame(x = rep(LETTERS[1:4], 3), y =letters[1:12], stringsAsFactors=FALSE)
junk$nm[junk$nm == "B"] <- "b"

Option 2:

levels(junk$nm) <- c(levels(junk$nm), "b")
junk$nm[junk$nm == "B"] <- "b"
junk

answered Apr 28 '11 at 20:18

IRTFM

258,963
21
364
487

@DWin thanks for your input on the problem and the need to consider the type of variable. I accepted @diliop's answer because it was the first working one. I know there are a lot of issues over <- vs = but (if it can be answered briefly) why should = be used with `data.frame`? – DQdlM Apr 28 '11 at 20:34
You don't need to add `b` as a level, just change the level that is `B` to `b`. – Gavin Simpson Apr 28 '11 at 20:40
@KennyPeanuts: the column name is one issue, Look at `a <- data.frame(x<-1:10)` . Its column name is not `x` but rather a messy `x....1.10`. Better to use data.frame(x=1:10). Then you know what your column name is. – IRTFM Apr 28 '11 at 20:52
@Gavin: Easier to add than to replace, and even easier not to make it a factor. – IRTFM Apr 28 '11 at 20:53
@Dwin Easier? I disagree - see my Answer for something simple. Adding levels can catch you out, say in modelling with `predict()` which will complain if factors levels in new data don't match those used to fit the model. Cleaner in long run to get the data formatted as you want, properly, than rely on short cuts. I agree it might be easier to not make it a factor, but if it already is one, or needs to be one for some modelling exercise... – Gavin Simpson Apr 28 '11 at 21:15
@Gavin: Wouldn't predict error out under _either_ the added or replaced level situation? – IRTFM Apr 28 '11 at 21:18
@DWin depends what modelling function you are using. With `lm()` adding or replacing a level works equally well. In say `rpart()` if the levels are not *exactly* the same it will fail. So it does depend on what functions one is using, but we can say that if you get your data structured how you want *before* any modelling is done, then `predict()` will always work. – Gavin Simpson Apr 28 '11 at 21:28
@DWin thanks for the clarification on = vs <-, that makse sense. So far I have learned a lot of unanticipated things from the comments... this is great. – DQdlM Apr 29 '11 at 01:04

AnilGoyal · Answer 7 · 2021-03-31T05:12:23.997

4

You can use ifelse too, which is very simple to understand

junk$val <- ifelse(junk$nm == "B", "b", junk$val)

If you still want to do it through for loop the correct way of doing it

for(i in 1:nrow(junk)){
  if(junk[i, "nm"] == "B"){
    junk[i, "val"] <- "b"
  }
}

junk
> junk
   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   b
7   C   g
8   D   h
9   A   i
10  B   b
11  C   k
12  D   l

edited Mar 31 '21 at 05:12

answered Mar 31 '21 at 05:07

AnilGoyal

25,297
4
27
45

score 2 · Answer 8 · answered Feb 20 '18 at 15:28

2

If you are working with character variables (note that stringsAsFactors is false here) you can use replace:

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12], stringsAsFactors = FALSE)
colnames(junk) <- c("nm", "val")

junk$nm <- replace(junk$nm, junk$nm == "B", "b")
junk
#    nm val
# 1   A   a
# 2   b   b
# 3   C   c
# 4   D   d
# ...

answered Feb 20 '18 at 15:28

loki

9,816
7
56
82

This works for all variable types. I just used it to insert NA at certain indices in an integer vector. – Jonas Lindeløv Oct 20 '20 at 13:07

score 1 · Answer 9 · answered Apr 16 '21 at 13:55

I got the same issue, you can also do the same thing for each columns,

 fix_junk <- function(x){
      #x <- as.character(x)
      x[x == "B"] <- "b"
      x
    }
    junk[] <- lapply(junk, fix_junk); junk # junk[] to get a data frame rather than a list
    junk[1:3] <- lapply(junk[1:3], fix_junk); junk

score 0 · Answer 10 · edited Apr 08 '19 at 07:10

0

stata.replace<-function(data,replacevar,replacevalue,ifs) {
  ifs=parse(text=ifs)
  yy=as.numeric(eval(ifs,data,parent.frame()))
  x=sum(yy)
  data=cbind(data,yy)
  data[yy==1,replacevar]=replacevalue
  message=noquote(paste0(x, " replacement are made"))
  print(message)
  return(data[,1:(ncol(data)-1)])
}

Call this function using below line.

d=stata.replace(d,"under20",1,"age<20")

edited Apr 08 '19 at 07:10

Jitesh Prajapati

2,533
4
29
51

answered Apr 08 '19 at 06:47

Devendra Karanjit

11
3

Replace a value in a data frame based on a conditional (`if`) statement

10 Answers10

Linked

Related