0

After speaking to a friend of mine I'm not sure anymore if my solution to perform a simple testing operation is very good in respect to the way one should programm in R. I'm quite new to r so I could use some feedback on how to learn it correctly instead of producing a lot of code.

My aim was simply to group values in a column based on diffrent ranges. so what I did was the following:

    #create a test df
    a<-factor(c("a","b","c","d","e","f","g"))
    b<-c(1,2,NA,4,5,6,7)
    c<-factor(c("a","a","a","d","e","f","a"))
    d<-c(1,7,1,7,2,5,4)
    df.abcd<-data.frame(a,b,c,d)
    df.abcd

    # apply groups in new column based on values in d 
    # groups are 0-2, 3-5, 6-7
    df.abcd$groups<-
      ifelse(df.abcd$d>-1&df.abcd$d<=2,"0-2",
             ifelse(df.abcd$d>2&df.abcd$d<=5,"3-5",
                    ifelse(df.abcd$d>5&df.abcd$d<=7,"6-7","outside the defined Ranges" 
                    )

             )
      )

so this solution works well for me but it is a lot of code and a lot of ifelse stuff. maybe there should be a more elegant solution.

My friend tolled me that R is not designed to do so much work in dataframes (in my case I apply a new column) but to work with new objects directly. So he suggested something like creating the groups as objects like Group1<- etc. Since I learn R on my own and I have no professor to teach me to do it the right way (learning by doing) I wan't to avoid working against the logic of the language (if there is such thing like that).

so any help and explanations would be appreciated. Best

Joschi
  • 2,941
  • 9
  • 28
  • 36

1 Answers1

4

You can use cut() for this, and then adjust your factor levels:

df.abcd$groups <- cut(df.abcd$d, c(0,2,5,7))
levels(df.abcd$groups) <- c("0-2", "3-5", "6-7", "Outside the defined range")
df.abcd$groups[is.na(df.abcd$groups)] <- max(levels(df.abcd$groups))

Or else you could use index vectors, for example if you didn't want to split a continuous range:

df.abcd$groups[df.abcd$d>-1 & df.abcd$d<=2] <- "0-2"
df.abcd$groups[df.abcd$d>2 & df.abcd$d<=5] <- "3-5"
df.abcd$groups[df.abcd$d>5 & df.abcd$d<=7] <- "6-7"
df.abcd$groups[is.na(df.abcd$groups)] <- "Outside the defined range"
df.abcd$groups <- as.factor(df.abcd$groups)

In general, looping and/or stacking a lot of ifelse's is not a good idea. Use index vectors and built-in R functions where possible.

Theodore Lytras
  • 3,955
  • 1
  • 18
  • 25
  • :D I knew there was an easier way.thank you. but more in general: is it good to use a lot "ifelses" and loops? Mi friend mentioned that using a lot of contitioning and loops is not very good in R. and is it better to work with objects instead of the new column? – Joschi Jan 08 '13 at 09:38