3

my recode attempts

df$test[(df$1st==(1:3) & df$2nd <= 4)] <- 1
df$test[(df$1st==(1:3) & df$2nd <= 5)] <- 2
df$test[(df$1st==(1:3) & df$2nd <= 6)] <- 3

result in a "longer object length is not a multiple of shorter object length" warning and a lot of NAs in df$test, even though some recodes work correctly.
What am I missing? Any help appreciated.

dw

Marek
  • 49,472
  • 15
  • 99
  • 121
dw006
  • 31
  • 1

3 Answers3

5

Problem is in this line:

df$1st==(1:3)

You could use %in%

df$1st %in% (1:3)

Warning comes cause you compare vectors of different lengths (1:3 has length 3 and df$1st has length "only you know what").

Beside I think you missed that your values are overwritten: df$2nd <= 4 is also df$2nd <= 6 so all 1 and 2 are overwrite by 3.

Marek
  • 49,472
  • 15
  • 99
  • 121
  • sorry, overwriting only takes place in my example, which I put down too fast and erronous... – dw006 Dec 13 '10 at 12:00
4

I am not sure what you're trying to achieve with df$1st==(1:3), but it probably doesn't do what you think it does. It recycles c(1,2,3) as many times as it needs to make it as long as df.

If you are trying to check if df$1st is between 1 and 3, you might want to spell it out:

df$1st>=1 & df$1st<=3
NPE
  • 486,780
  • 108
  • 951
  • 1,012
1

You may also want to consider using transform() to deal with recoding issues such as this. transform() will perform slower than the logical indexing method, but is easier to digest the intent of the code. A good discussion of the pros and cons of the different methods can be found here. Consider:

set.seed(42)
df <- data.frame("first" = sample(1:5, 10e5, TRUE), "second" = sample(4:8, 10e5, TRUE))

df <- transform(df
    , test =      ifelse(first %in% 1:3 & second == 4, 1
            , ifelse(first %in% 1:3 & second == 5, 2
            , ifelse(first %in% 1:3 & second == 6, 3, NA)))
    )

Secondly, the column names 1st and 2nd are not syntactically valid column names. Take a look at make.names() for more details on what constitutes valid column names. When working with a data.frame, you can use/abuse the check.names argument. For example:

> df <- data.frame("1st" = sample(1:5, 10e5, TRUE), "2nd" = sample(4:8, 10e5, TRUE), check.names = FALSE)
> colnames(df)
[1] "1st" "2nd"
> df <- data.frame("1st" = sample(1:5, 10e5, TRUE), "2nd" = sample(4:8, 10e5, TRUE), check.names = TRUE)
> colnames(df)
[1] "X1st" "X2nd"
Community
  • 1
  • 1
Chase
  • 67,710
  • 18
  • 144
  • 161