-1

Assuming I have a dataset:

    X   Y
1   0 500
2 125 375
3 250 250
4 375 125
5 500 500
6 750 250
  ....
  ....

which can be generated by:

df <- data.frame(X = c(0,125,250,375,500,750), Y=c(500,375,250,125,500,250))

I need to assign a category value based on the numerical relationship of X and Y. For example:

if X=0, then assign label A
if Y>X and Y/X=3 then assign label B
if X=Y then assign label C
if X>Y and X/Y=3 then assign label D

So essentially, I am assigning labels based on the ratio of X and Y: 0, 0.25, 0.75, 1. So the end result I am hoping for it:

    X   Y   Category
1   0 500   A
2 125 375   B
3 250 250   C
4 375 125   D
5 500 500   C
6 750 250   D
  ....
  ....

How should I accomplish this? Thanks

Oliver
  • 3,592
  • 8
  • 34
  • 37
  • 3
    What have you tried so far? Please share some of the code you've attempted rather than just asking for us to write it for you... – Justin Oct 01 '13 at 13:56
  • @Oliver note that row 6 does not meet condition 3, therefore I cant be labeled C, but D. – Jilber Urbina Oct 01 '13 at 14:00
  • 2
    I have tried, unsuccessfully, with ratio = df$X/df$Y, then using plyr's mapvalues() map from (Inf, 1, 3) to A, B, C. However, this approach fails when the ratio involves floating point. For example, X=250, Y=375. – Oliver Oct 01 '13 at 14:01
  • @Jilber, It is a mistake, my bad, it should be D I think – Oliver Oct 01 '13 at 14:02
  • Your ratios -- 0, 0.25, 0.75, 1 -- do not match up with your description (0, 1/3, 1, 3); and your inequalities are redundant if you have only positive values (as it seems...). – Frank Oct 01 '13 at 14:26

3 Answers3

5

Using the data.table package

library(data.table)
df <- data.table(X = c(0,125,250,375,500,750), Y=c(500,375,250,125,500,250))

# if X=0, then assign label A
df[X ==0, Label := "A"]
# if Y>X and Y/X=3 then assign label B
df[Y > X & Y/X == 3, Label := "B"]
# if X=Y then assign label C
df[Y == X, Label := "C"]
# if X>Y and X/Y=3 then assign label D
df[X > Y & X/Y == 3, Label := "D"]

     # X   Y Label
# 1:   0 500     A
# 2: 125 375     B
# 3: 250 250     C
# 4: 375 125     D
# 5: 500 500     C
# 6: 750 250     D

And using @Jilber approach with data.table -

df[, Label := ifelse( X > Y & X/Y == 3, "D", 
     ifelse(Y == X,"C",
         ifelse(Y > X & Y/X == 3, "B", "A"
         )
      )
   )
]
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
2

or using standard data.frames

df <- within(df, {
  label <- NA
  label[X == 0]           <- "A"
  label[Y > X & Y/X == 3] <- "B"
  label[Y == X]           <- "C"
  label[X > Y & X/Y == 3] <- "D"
})

should update df with the required column

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
1

Use ifelse

> transform(df, Category=ifelse(X==0, "A",
                                ifelse(Y>X & Y/X==3, "B", 
                                       ifelse(X==Y, "C", "D"))))
    X   Y Category
1   0 500        A
2 125 375        B
3 250 250        C
4 375 125        D
5 500 500        C
6 750 250        D
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • If the data is actually large, triply nested `ifelse` calls might be horribly slow. Have a look [here](http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow/16275201). – Ricardo Saporta Oct 01 '13 at 14:08
  • Interesting use of ifelse. I learned something. The numerical relations seems more explicit with data.table approach of Codoremifa. thanks – Oliver Oct 01 '13 at 14:17