151

In the R data frame coded for below, I would like to replace all of the times that B appears with b.

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12])
colnames(junk) <- c("nm", "val")

this provides:

   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   f
7   C   g
8   D   h
9   A   i
10  B   j
11  C   k
12  D   l

My initial attempt was to use a for and if statements like so:

for(i in junk$nm) if(i %in% "B") junk$nm <- "b"

but as I am sure you can see, this replaces ALL of the values of junk$nm with b. I can see why this is doing this but I can't seem to get it to replace only those cases of junk$nm where the original value was B.

NOTE: I managed to solve the problem with gsub but in the interest of learning R I still would like to know how to get my original approach to work (if it is possible)

DQdlM
  • 9,814
  • 13
  • 37
  • 34
  • 1
    you might want to add stringsAsFactors = FALSE to the original data.frame construction. – jimmyb Apr 28 '11 at 21:00
  • @jimmyb Why? Factors are useful, and necessary if one is modelling with most of R's modelling code. The correct way of dealing with this is to acknowledge that the data are a factor. If you don't want/need this conversion then you can do as you say. If you do want the factor, then there are easy ways to do the manipulation @Kenny wants to perform. – Gavin Simpson Apr 28 '11 at 21:18
  • 1
    So factors used to be more popular because of performance, however, now that strings are immutable and hashed the value of factors is less obvious, as most of the base R functionality will just convert them (albeit with warnings) directly. I think factors result in a significant number of bugs that I find in peoples R code. – jimmyb Apr 29 '11 at 00:15

10 Answers10

254

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)
diliop
  • 9,241
  • 5
  • 28
  • 23
48

another useful way to replace values

library(plyr)
junk$nm <- revalue(junk$nm, c("B"="b"))
qwr
  • 9,525
  • 5
  • 58
  • 102
Oriol Prat
  • 1,017
  • 1
  • 11
  • 19
30

Short answer is:

junk$nm[junk$nm %in% "B"] <- "b"

Take a look at Index vectors in R Introduction (if you don't read it yet).


EDIT. As noticed in comments this solution works for character vectors so fail on your data.

For factor best way is to change level:

levels(junk$nm)[levels(junk$nm)=="B"] <- "b"
Marek
  • 49,472
  • 15
  • 99
  • 121
  • Short addition: The usage of %in% only really helps if you have a set on the right side, as `c("B","C")`. Doing `junk$nm[junk$nm == "B"]` is the better way. – Thilo Apr 28 '11 at 20:14
  • 1
    Oh, another, important addition: Doing it like this requires first adding the factor level `b` to the factor nm. diliop's version is in fact the better one if you want to work with characters, not factors. (Always think about the type your variables have first!) – Thilo Apr 28 '11 at 20:18
  • that doesn't work on the data as created by @Kenny because the data are factors. Did you forget a step or do you have the global setting to stop converting characters to factors? – Gavin Simpson Apr 28 '11 at 20:27
  • 4
    @Thilo One of the important differences between `%in%` and `==` is `NA` handling: `c(1,2,NA)==1` gives `TRUE, FALSE, NA` but `c(1,2,NA) %in% 1` gives `TRUE, FALSE, FALSE`. And yes I forgot to check if this work :/ – Marek Apr 28 '11 at 20:41
  • This helped me out as I was specifically looking for how to do this for all values in a character vector. Thanks. – The_Tams Nov 05 '21 at 16:22
21

As the data you show are factors, it complicates things a little bit. @diliop's Answer approaches the problem by converting to nm to a character variable. To get back to the original factors a further step is required.

An alternative is to manipulate the levels of the factor in place.

> lev <- with(junk, levels(nm))
> lev[lev == "B"] <- "b"
> junk2 <- within(junk, levels(nm) <- lev)
> junk2
   nm val
1   A   a
2   b   b
3   C   c
4   D   d
5   A   e
6   b   f
7   C   g
8   D   h
9   A   i
10  b   j
11  C   k
12  D   l

That is quite simple and I often forget that there is a replacement function for levels().

Edit: As noted by @Seth in the comments, this can be done in a one-liner, without loss of clarity:

within(junk, levels(nm)[levels(nm) == "B"] <- "b")
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 6
    Nice. I didn't know about the replacement function for `levels()`. How about the one liner `junk <- within(junk, levels(nm)[levels(nm)=="B"] <- "b")`? –  Apr 28 '11 at 22:26
  • 2
    @Marek **slaps head** Just goes to show that one shouldn't respond to comments on SO when it is well past ones bedtime. Lets try that again... – Gavin Simpson Apr 29 '11 at 08:36
  • @Seth Indeed - nice. Not sure why I separated the steps? Perhaps for exposition... – Gavin Simpson Apr 29 '11 at 08:37
12

The easiest way to do this in one command is to use which command and also need not to change the factors into character by doing this:

junk$nm[which(junk$nm=="B")]<-"b"
user1021713
  • 2,133
  • 8
  • 27
  • 40
5

You have created a factor variable in nm so you either need to avoid doing so or add an additional level to the factor attributes. You should also avoid using <- in the arguments to data.frame()

Option 1:

junk <- data.frame(x = rep(LETTERS[1:4], 3), y =letters[1:12], stringsAsFactors=FALSE)
junk$nm[junk$nm == "B"] <- "b"

Option 2:

levels(junk$nm) <- c(levels(junk$nm), "b")
junk$nm[junk$nm == "B"] <- "b"
junk
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • @DWin thanks for your input on the problem and the need to consider the type of variable. I accepted @diliop's answer because it was the first working one. I know there are a lot of issues over <- vs = but (if it can be answered briefly) why should = be used with `data.frame`? – DQdlM Apr 28 '11 at 20:34
  • You don't need to add `b` as a level, just change the level that is `B` to `b`. – Gavin Simpson Apr 28 '11 at 20:40
  • @KennyPeanuts: the column name is one issue, Look at `a <- data.frame(x<-1:10)` . Its column name is not `x` but rather a messy `x....1.10`. Better to use data.frame(x=1:10). Then you know what your column name is. – IRTFM Apr 28 '11 at 20:52
  • @Gavin: Easier to add than to replace, and even easier not to make it a factor. – IRTFM Apr 28 '11 at 20:53
  • @Dwin Easier? I disagree - see my Answer for something simple. Adding levels can catch you out, say in modelling with `predict()` which will complain if factors levels in new data don't match those used to fit the model. Cleaner in long run to get the data formatted as you want, properly, than rely on short cuts. I agree it might be easier to not make it a factor, but if it already is one, or needs to be one for some modelling exercise... – Gavin Simpson Apr 28 '11 at 21:15
  • @Gavin: Wouldn't predict error out under _either_ the added or replaced level situation? – IRTFM Apr 28 '11 at 21:18
  • @DWin depends what modelling function you are using. With `lm()` adding or replacing a level works equally well. In say `rpart()` if the levels are not *exactly* the same it will fail. So it does depend on what functions one is using, but we can say that if you get your data structured how you want *before* any modelling is done, then `predict()` will always work. – Gavin Simpson Apr 28 '11 at 21:28
  • @DWin thanks for the clarification on = vs <-, that makse sense. So far I have learned a lot of unanticipated things from the comments... this is great. – DQdlM Apr 29 '11 at 01:04
4

You can use ifelse too, which is very simple to understand

junk$val <- ifelse(junk$nm == "B", "b", junk$val)

If you still want to do it through for loop the correct way of doing it

for(i in 1:nrow(junk)){
  if(junk[i, "nm"] == "B"){
    junk[i, "val"] <- "b"
  }
}

junk
> junk
   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   b
7   C   g
8   D   h
9   A   i
10  B   b
11  C   k
12  D   l
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
2

If you are working with character variables (note that stringsAsFactors is false here) you can use replace:

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12], stringsAsFactors = FALSE)
colnames(junk) <- c("nm", "val")

junk$nm <- replace(junk$nm, junk$nm == "B", "b")
junk
#    nm val
# 1   A   a
# 2   b   b
# 3   C   c
# 4   D   d
# ...
loki
  • 9,816
  • 7
  • 56
  • 82
1

I got the same issue, you can also do the same thing for each columns,

 fix_junk <- function(x){
      #x <- as.character(x)
      x[x == "B"] <- "b"
      x
    }
    junk[] <- lapply(junk, fix_junk); junk # junk[] to get a data frame rather than a list
    junk[1:3] <- lapply(junk[1:3], fix_junk); junk
Seyma Kalay
  • 2,037
  • 10
  • 22
0
stata.replace<-function(data,replacevar,replacevalue,ifs) {
  ifs=parse(text=ifs)
  yy=as.numeric(eval(ifs,data,parent.frame()))
  x=sum(yy)
  data=cbind(data,yy)
  data[yy==1,replacevar]=replacevalue
  message=noquote(paste0(x, " replacement are made"))
  print(message)
  return(data[,1:(ncol(data)-1)])
}

Call this function using below line.

d=stata.replace(d,"under20",1,"age<20")
Jitesh Prajapati
  • 2,533
  • 4
  • 29
  • 51