0

I have a data.frame in R that consists in a number of columns with numerical values. Like this:

   A       B      C
0.6057  0.1644  6.93
0.5723  0.117   6.59
0.5614  0.1552  7.02
0.4102  0.1059  5.24
0.4945  0.0857  6.64
0.5157  0.0747  7.06
0.7785  0.1394  5.21
0.5492  0.1557  6.06
0.5411  0.1884  5.68
0.6622  0.148   6.1

For each of these columns, I want to create a new column containing the quartile values. I have no problem doing it over one column at a time using this formula:

tableOne <- within(data, quartile <-
                    as.integer(cut(A, quantile(A, probs=0:5/5,na.rm=T))))

But as I have 100 columns with different names, I wanted to loop over each column separately.

I tried a loop without success:

for(i in names(data)){
  tableOne <- within(data, quarti <- as.integer(cut(i, quantile(i, probs=0:5/5,na.rm=T))))
}

I get the following error:

Error in cut.default(i, quantile(i, probs = 0:5/5, na.rm = T)) : 
  'x' must be numeric

I also tried apply function:

df.two <- lapply(df, function(x) within(data, quartile <- as.integer(cut(x, quantile(x, probs=0:5/5,na.rm=T)))))

with no success:

Error during wrapup: argument "obj" is missing, with no default
Error during wrapup: target context is not on the stack

Any advices on how to iterate my functions over all the columns and get all results in the same data.frame?

Thanks a lot

smci
  • 32,567
  • 20
  • 113
  • 146
user3188922
  • 329
  • 1
  • 3
  • 19
  • Create a function (with a vector as parameter) and use `lapply`. – Roland Dec 15 '15 at 16:27
  • I tried with apply but I am not better at it than the loop! – user3188922 Dec 15 '15 at 16:28
  • `df.two <- lapply(data, function(x) within(data, quartile <- as.integer(cut(x, quantile(x, probs=0:5/5,na.rm=T)))))` – user3188922 Dec 15 '15 at 16:29
  • Maybe, `cbind(df, lapply(df, function(x) as.integer(cut(x, quantile(x, probs=0:5/5,na.rm=T)))))`, but I can't test since you don't provide a reproducible example. – Roland Dec 15 '15 at 16:33
  • Yeah sorry I did not want to bother everybody with a long dataset but unfortunatelly I can't make it work! So I am going to edit my post and add some dataset – user3188922 Dec 15 '15 at 16:36
  • 1
    You example dataset does not need more than 3 columns and maybe 10 rows. – Roland Dec 15 '15 at 16:37

1 Answers1

5

See end of answer for a better approach, this one is for easy understanding of the steps.

I'm unsure what you're willing to do, but maybe this:

df2<- as.data.frame( lapply( df, function(x){
  as.integer( cut(x, quantile(x, probs=(0:5)/5, na.rm=T)))
}))
colnames(df2) <- paste0("quartile_",colnames(df))
df3 <- cbind(df,df2)

Which gives:

        A      B    C quartile_A quartile_B quartile_C
1  0.6057 0.1644 6.93          4          5          4
2  0.5723 0.1170 6.59          4          2          3
3  0.5614 0.1552 7.02          3          4          5
4  0.4102 0.1059 5.24         NA          2          1
5  0.4945 0.0857 6.64          1          1          4
6  0.5157 0.0747 7.06          2         NA          5
7  0.7785 0.1394 5.21          5          3         NA
8  0.5492 0.1557 6.06          3          4          2
9  0.5411 0.1884 5.68          2          5          2
10 0.6622 0.1480 6.10          5          3          3

Datas used:

> dput(df)
structure(list(A = c(0.6057, 0.5723, 0.5614, 0.4102, 0.4945, 
0.5157, 0.7785, 0.5492, 0.5411, 0.6622), B = c(0.1644, 0.117, 
0.1552, 0.1059, 0.0857, 0.0747, 0.1394, 0.1557, 0.1884, 0.148
), C = c(6.93, 6.59, 7.02, 5.24, 6.64, 7.06, 5.21, 6.06, 5.68, 
6.1)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-10L))

As per @DavidArenburg comment below a better way to achieve the same result is:

df[paste0("quartile_",colnames(df))] <- lapply(df, function(x) as.integer(cut(x, quantile(x, probs=(0:5)/5, na.rm = TRUE))))

This avoid creating a new dataframe and copying it over at end.

Tensibai
  • 15,557
  • 1
  • 37
  • 57
  • You could do this in one go using `df[paste0("quartile_",colnames(df))] <- lapply(df, function(x) as.integer(cut(x, quantile(x, probs=(0:5)/5, na.rm = TRUE))))` – David Arenburg Dec 15 '15 at 17:58
  • @David I kept it in 3 steps to ease the understanding (and been too lazy to shorten it on 1 pass), that said you should add it as another answer for future readers ;) – Tensibai Dec 15 '15 at 18:01
  • 1
    I think you could just edit... I dont see a reason to convert to `data.frame` again, create a new object and then copy everything again using `cbind`. The above line isn't just shorter but is much more efficient I would think. – David Arenburg Dec 15 '15 at 18:03
  • @David and as it's yours, I don't wish to steal it ;) It's perfectly valid to have multiple answers, and here it's a different approach so it should stand on it's own to allow other to vote for the most useful/performant one to guide readers on their choose (In my humble opinion) – Tensibai Dec 15 '15 at 18:06