Shrinking dataframe randomly in R

Question

I have a dataframe of 70.000 rows which I want to reduce to 10.000. I know the cost is huge data loss, but I have my reasons. I want the cut-down to be evenly distributed throughout the data set, not just removing the first or last 60.000 rows. Is there a way to do this? If it's to any help, my dataframe looks like this:

ID   username     text              date
1    @calr        lorem ipsum...    2012-05-05
2    @mart        lorem ipsum...    2012-05-05
3    @falk        lorem ipsum...    2012-05-05
4    @grif        lorem ipsum...    2012-05-05

score 2 · Answer 1 · answered Jun 09 '22 at 21:00

2

df[sample.int(70000, size = 10000),]

answered Jun 09 '22 at 21:00

Jonathan

1,068
8
16

score 0 · Accepted Answer · answered Jun 09 '22 at 21:00

0

This solved my problem

df[sample(nrow(df), 10000), ]

answered Jun 09 '22 at 21:00

Quantizer

275
3
13

Shrinking dataframe randomly in R

2 Answers2