0

I am fairly new to R and can't find a concise way to a problem.

I have a dataframe in R called df that looks as such. It contain a column called values that contains values from 0 to 1 ordered numerically and a binary column called flag that contains either 0 or 1.

df
value     flag
0.033     0
0.139     0
0.452     1
0.532     0
0.687     1
0.993     1

I wish to split this dataframe into X amount of groups from 0 to 1. For example if I wished a 4 split grouping, the data would be split from 0-0.25, 0.25-0.5, 0.5-0.75, 0.75-1. This data would also contain the corresponding flag to that point.

I want to solution to be scalable so if I wished to split it into more group then I can. I am also limited to the tidyverse packages.

Does anyone have a solution for this? Thanks

zx8754
  • 52,746
  • 12
  • 114
  • 209
geds133
  • 1,503
  • 5
  • 20
  • 52
  • Why are you limited to the tidyverse packages? Sounds like an odd constraint to me. – Jaap Aug 24 '20 at 09:21
  • My work only allows that. – geds133 Aug 24 '20 at 09:25
  • 5
    Sounds like you better look for another job then. Limiting your employees to only tidyverse-packages is (imho) one of the most stupid contraints you can impose (not blaming you though). You will miss a lot from the rich ecosystem R has. – Jaap Aug 24 '20 at 09:58

2 Answers2

1

if n is the number of partitions:

L = seq(1,n)/n

GroupedList = lapply(L,function(x){
                 df[(df$value < x) & (df$value > (x-(1/n))),]
               })

I think this should produce a list of dataframes where each dataframe contains what you asked.

dvd280
  • 876
  • 6
  • 11
0

You can use cut to divide data into n groups and use it in split to have list of dataframes.

n <- 4
list_df <- split(df, cut(df$value, breaks = n))

If you want to split the data between 0-1 into n groups you can do :

list_df <- split(df, cut(df$value, seq(0, 1, length.out = n + 1)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This has not split the groups in relation to 0 to 1. My first group is -0.0002 to 0.208. It is splitting by the data itself not range 0 to 1. – geds133 Aug 24 '20 at 08:42
  • If I read your question it states `I wish to split this dataframe into X amount of groups based upon the values column.`. Yes this divides the data into groups based on your `value` column. Nor 0 nor 1 is in your data. I think the updated answer will do what you need. – Ronak Shah Aug 24 '20 at 08:45
  • Apologies, I have fixed the question. – geds133 Aug 24 '20 at 08:47
  • The answers unfortunately don't work on my question. The first, uses a package that I am unable to use, which has been updated in the question. The final answer splits off a column called `i` which my dataframe doesn't contain. The other answer doesn't use dataframe at all. – geds133 Aug 24 '20 at 09:09
  • My answers don't use any package. Are you talking about the duplicates marked? – Ronak Shah Aug 24 '20 at 09:11
  • Yes sorry I assumed you had put them as no one else decided to comment. – geds133 Aug 24 '20 at 09:18
  • @geds133 My answer with `seq` doesn't give you the expected output? – Ronak Shah Aug 24 '20 at 09:33