0

I have a dataframe of 70.000 rows which I want to reduce to 10.000. I know the cost is huge data loss, but I have my reasons. I want the cut-down to be evenly distributed throughout the data set, not just removing the first or last 60.000 rows. Is there a way to do this? If it's to any help, my dataframe looks like this:

ID   username     text              date
1    @calr        lorem ipsum...    2012-05-05
2    @mart        lorem ipsum...    2012-05-05
3    @falk        lorem ipsum...    2012-05-05
4    @grif        lorem ipsum...    2012-05-05
Quantizer
  • 275
  • 3
  • 13

2 Answers2

2
df[sample.int(70000, size = 10000),]
Jonathan
  • 1,068
  • 8
  • 16
0

This solved my problem

df[sample(nrow(df), 10000), ]
Quantizer
  • 275
  • 3
  • 13