I apologize because it seems this question has been asked many times, but I have read through several questions and answers and tried different solutions and am still having problems, so I hope someone can help!
I have a dataframe with nearly 30 million observations (rows) and 6 variables (columns), and I want to delete the last ~5 million observations.
I have tried the following three proceedures:
#read in the csv
data <-read.csv('mydata.csv')
#try this
#delete specified rows
dataresized <- data[-24579580:-29495496]
#try this instead
#keep only first 24549580 rows (x=id or rownumber)
dataresized2 <- subset(data, "X" < 24579581)
#try this instead
unwantedrows <- data %in% 24579580:29495496
dataresized3 <- data[!unwantedrows]
The first code didn't seem to do anything -i.e., no rows were removed. The second option seemed to remove everything, i.e, no rows remained. The third option seemed to crash the system.
Any suggestions would be greatly appreciated! Thanks!