Deleting many, specific rows in R

Question

I apologize because it seems this question has been asked many times, but I have read through several questions and answers and tried different solutions and am still having problems, so I hope someone can help!

I have a dataframe with nearly 30 million observations (rows) and 6 variables (columns), and I want to delete the last ~5 million observations.

I have tried the following three proceedures:

#read in the csv
data <-read.csv('mydata.csv')

#try this
#delete specified rows
dataresized <- data[-24579580:-29495496]

#try this instead
#keep only first 24549580 rows (x=id or rownumber)
dataresized2 <- subset(data, "X" < 24579581)

#try this instead
unwantedrows <- data %in% 24579580:29495496
dataresized3 <- data[!unwantedrows]

The first code didn't seem to do anything -i.e., no rows were removed. The second option seemed to remove everything, i.e, no rows remained. The third option seemed to crash the system.

Any suggestions would be greatly appreciated! Thanks!

with data frames, you have to use both index arguments (e.g. `[-(1:10), ]`, notice the comma and the blank), otherwise you're removing columns. Also, for subset, you should just use the variable name. — BrodieG, Jan 30 '14 at 23:30
Thank you! adding the comma and blank to the index argument worked perfectly :) — user3251223, Jan 31 '14 at 00:05
Tho' in your case (removing the "bottom" of the dataframe), there's no real need to use negative indexing. `data<-data[1:N,]` will work just as well. — Carl Witthoft, Jan 31 '14 at 02:01

Deleting many, specific rows in R

0 Answers0