47

I have a table that has two columns: whether you were sick (H01) and the number of days sick (H03). However, the number of days sick is NA if H01 == false, and I would like to set it to 0. When I do this:

test <- pe94.person[pe94.person$H01 == 12,]
test$H03 <- 0

It works fine. However, I'd like to replace the values in the original dataframe. This, however, fails:

pe94.person[pe94.person$H01 == 12,]$H03 <- 0

It returns:

> pe94.person[pe94.person$H01 == 12,]$H03 <- 0
Error in `[<-.data.frame`(`*tmp*`, pe94.person$H01 == 12, , value = list( : 
  missing values are not allowed in subscripted assignments of data frames

Any idea why this is? For what it's worth, here's a frequency table:

> table(pe94.person[pe94.person$H01 == 12,]$H03)

 2  3  5 28 
 3  1  1  1 
Andrew Min
  • 892
  • 2
  • 8
  • 11
  • 8
    Most likely because you have `NA`s in the column `H01`. Note the `useNA` argument to table, which you haven't used. Also, it's probably better (stylistically) to reference the column inside `[` rather than using `$`. – joran Apr 30 '14 at 19:19
  • 1
    That makes sense; I figured as much. How would I replace the NAs? Sorry, I don't have a lot of experience with R. – Andrew Min Apr 30 '14 at 19:22
  • 2
    `pe94.person$H01[is.na(p94.person$H01)] <- value` probably. – joran Apr 30 '14 at 19:22
  • Hmm, but I can't replace all the NAs because some NAs simply denote missing values (i.e. the person didn't respond). Also, I'm a bit confused why splitting it up works (using the test variable), but doing it in one line does not? – Andrew Min Apr 30 '14 at 19:25
  • 1
    Because when you split it up R doesn't have to infer what row an index of `NA` goes with; you're simply telling it to re-assign all the values in a particular column. – joran Apr 30 '14 at 19:37
  • @joran beat me to the comment! (+1) and see my answer to preserve the NA's and still assign values for other conditions. – infominer Apr 30 '14 at 19:39
  • Take a look at my answer for, hopefully, some additional insights. – Thomas Apr 30 '14 at 19:44

6 Answers6

43

It is due to missingness in H01 variable.

> x <- data.frame(a=c(NA,2:5), b=c(1:5))
> x
   a b
1 NA 1
2  2 2
3  3 3
4  4 4
5  5 5
> x[x$a==2,]$b <- 99
Error in `[<-.data.frame`(`*tmp*`, x$a == 1, , value = list(a = NA_integer_,  : 
  missing values are not allowed in subscripted assignments of data frames

The assignment won't work because x$a has a missing value.

Subsetting first works:

> z <- x[x$a==2,]
> z$b <- 99
> z <- x[x$a==2,]
> z
    a  b
NA NA NA
2   2  2

But that's because the [<- function apparently can't handle missing values in its extraction indices, even though [ can:

> `[<-`(x,x$a==2,,99)
Error in `[<-.data.frame`(x, x$a == 2, , 99) : 
  missing values are not allowed in subscripted assignments of data frames

So instead, trying specifying your !is.na(x$a) part when you're doing the assignment:

> `[<-`(x,!is.na(x$a) & x$a==2,'b',99)
   a  b
1 NA  1
2  2 99
3  3  3
4  4  4
5  5  5

Or, more commonly:

> x[!is.na(x$a) & x$a==2,]$b <- 99
> x
   a  b
1 NA  1
2  2 99
3  3  3
4  4  4
5  5  5

Note that this behavior is described in the documentation:

The replacement methods can be used to add whole column(s) by specifying non-existent column(s), in which case the column(s) are added at the right-hand edge of the data frame and numerical indices must be contiguous to existing indices. On the other hand, rows can be added at any row after the current last row, and the columns will be in-filled with missing values. Missing values in the indices are not allowed for replacement.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • 4
    You can also get around missing values by using the `%in%` operator instead of `==`, see [here](https://stackoverflow.com/q/16822426/4241780) for an explanation. So either `x[x$a %in% 2,]$b <- 99`, or for the OPs example `pe94.person[pe94.person$H01 %in% 12,]$H03 <- 0`, would work. – JWilliman Jan 25 '18 at 22:49
13

You can use ifelse, like so

pe94.person$foo <- ifelse(!is.na(pe94.person$H01) & pe94.person$H01 == 12, 0, pe94.person$H03)

check if foo meets your criteria and then go ahead and assign it to pe94.person$H03 directly. I find it safer to assign it a new variable and usually use that in subsequent analysis.

infominer
  • 1,981
  • 13
  • 17
5

There might be an NA somewhere in the column that is causing the error. Run the index on a specific column instead of the entire data frame.

movies[movies$Actors == "N/A",] = NA #ERROR
movies$Actors[movies$Actors == "N/A"] = NA #Works
James L.
  • 12,893
  • 4
  • 49
  • 60
4

I realise the question is very old, but I think the most elegant solution is by using the which() function:

 pe94.person[which(pe94.person$H01 == 12),]$H03 <- 0

should do what the original poster asked for. Because which() drops the NAs and keeps the (positions of the) TRUE results only.

2

Simply use the subset() function to exclude all NA from the string.

It works as x[subset & !is.na(subset)]. Look at this data:

> x <- data.frame(a = c(T,F,T,F,NA,F,T, F, NA,NA,T,T,F),
>                 b = c(F,T,T,F,T, T,NA,NA,F, T, T,F,F))

Subsetting with [ operator returns this:

> x[x$b == T & x$a == F, ]

         a    b
2    FALSE TRUE
NA      NA   NA
6    FALSE TRUE
NA.1    NA   NA
NA.2    NA   NA

And subset() does what we want:

> subset(x, b == T & a == F)

      a    b
2 FALSE TRUE
6 FALSE TRUE

To change the values of subsetted variables:

> ss <- subset(x, b == T & a == F)
> x[rownames(ss), 'a'] <- T

> x[c(2,6), ]

     a    b
2 TRUE TRUE
6 TRUE TRUE
2

Following works. Watch out there is no comma in sub setting:

x <- data.frame(a=c(NA,2:5), b=c(1:5))

x$a[x$a==2] <- 99
mskfisher
  • 3,291
  • 4
  • 35
  • 48