I am using R to try to create a column in my dataframe called df
that splits the data into 20 even groups, with the new column group
having the corresponding group for each row. An example of my ordered data looks as such:
preds ground_truth
65378 0.000002975379 0
27082 0.000004721652 0
26890 0.000006613435 1
130498 0.000007634303 0
173319 0.000007834359 0
20039 0.000009482496 0
64722 0.000009482496 0
53924 0.000009482496 0
165543 0.000009482496 0
I have asked a similar question before and there are similar answers, however the solutions do not work for some reason. The other answers are here:
Splitting a continuous variable into equal sized groups R divide data into groups
My solution was to use cut as such:
df$group <- cut(index(df), 20, labels = FALSE)
I expected this to cut the dataframe index into 20 even groups, thus over the 129844 rows, there would be 6492 in each group. However this only produces a singular group, not splitting the data at all. Could someone explain why cut here is not working, where it has for the other dataframes?
Any extra information I would be happy to supply,
EDIT: I need the data groupings to be in order with respect to preds e.g. the first group will contain the highest 6492 values, the second the next highest 6492 and so on.
The data grouping must be ordered in the sense that the top group will Here is a dput of the first 10 rows:
structure(list(preds = c(0.00000297537922317814,
0.00000472165221855588,
0.0000066134351160987, 0.00000763430272198875, 0.00000783435945631941,
0.00000948249581302744, 0.00000948249581314139, 0.00000948249581314247,
0.00000948249581314704, 0.0000094824958131879), ground_truth =
structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class =
"factor")), .Names = c("preds",
"ground_truth"), row.names = c("65378", "27082", "26890", "130498",
"173319", "20039", "64722", "53924", "165543", "168952"), class =
"data.frame")