How to delete a large number of rows from a .CSV file?

Question

I'm new to R and Rstudio, so this may seem kinda odd.

I'm currently trying to eliminate some rows from a very big (roughly 400.000 rows) .CSV file, but I face some problems.

Example of the Table(.CSV file)

Here is the output I want (In this example, we deleted the row 6:

1- When I execute:

tablename <- tablename[c(-row1, -row2), ]

for, let's say 7 rows, it works just fine for the first time I execute it. Then, if I execute the same syntax for another row that I want to delete. For example:

tablename <- tablename[c(-row3, -row4), ]" )

it seems that it dont delete the rows that I specified.

2- Because of the problem described above, I tried to create a 'super' syntax that has all the rows I want to delete. For example:

tablename <- tablename[c(-row1, -row2, ..., -row299, -row300), ]

The thing with this, is that it appears to do nothing (again). It just appears a '+' in the console, instead of the ' > '.

The last option I have, is to delete all the unwanted rows in the .CSV file with the Search Keyword option in WordPad, but it is not viable, as long as it would take me like 9 hours.

leerssej · Answer 1 · 2018-02-06T18:24:33.013

As per your further discussion of your intentions (found in the comments in Nick Knauer's response) copied here:

I'm going to give some intel about the project and why i do this. Consider the column 'Code'. Column code specifies uniquely a Person ( it can be more than 1 row for the same person. In that case, the Code will be the same) Consider, another column 'Class'. Class specifies a social class of a person. I used an SQL command to see if the same Person have diferent Social class along the file. What i noticed is that the results have several Social Classes for the same Person. The sintax i tried to write above are to delete the rows that have diferent SocialClasse per Person.

Aha! Just tell your machine to make these judgements AND have it filter them out accordingly. It is really good at it!

First join that class table up to the employee dataframe with a left_join. Then with a group_by(code) %>% mutate(cnt_class = n_distinct(class)) you can reveal which are your dupes of class by code. Then with a filter you can most easily deal accordingly with your dupes.

If you would like more precise help, please post up a reproducible example tl:dr - use dput to make a dataframe I can copy and paste into my rstudio and tinker with (never possible with a picture of the data).

For more info to help you leap up the learning curve, please see here for the simple menu/breakdown of other quick and easy dplyr data wrangling options - https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

score 0 · Answer 2 · answered Apr 15 '17 at 01:30

0

To delete a row in R, you can do it this way:

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
employ.data <- data.frame(employee, salary, startdate)

newdf<-employ.data[-c(1,2),]

That is for specific rows, if you want to do it for a range of rows, you can do it this way: (This will delete all the rows.)

newdf<-employ.data[-c(1:3),]

answered Apr 15 '17 at 01:30

nak5120

4,089
4
35
94

Perhaps i didn't explain myself very well; I'm sorry for it. I can't use the "newdf<-employ.data[-c(1:3),]", because the rows aren't together in a specified range. They are just disorganized and pretty much scattered along the file. – LuísA Apr 15 '17 at 01:36
Have you tried eliminating with the supersyntax for the newdf variable? So instead of saying "row1" you just write 1 and "row2" would be 2. You would be taking the indices of the rows rather than the row names – nak5120 Apr 15 '17 at 01:39
For example: TableName <- TableName[-c(1,2,...), ]? I did tried. And unfortunately, the result is the same. – LuísA Apr 15 '17 at 01:43
Can you revise the question with an original dataset and what you want the final output to look like? Doesn't have to be your actual dataset, just a template to base it off of – nak5120 Apr 15 '17 at 01:44
Already uploaded two examples. I couldn't upload a 3rd photo, so here is the link for the 'Super Sintax' example: http://i.imgur.com/Hz2T7Mf.png – LuísA Apr 15 '17 at 02:13
1

is there a pattern or reason you are deleting those rows? It could be easier to delete rows by the pattern/reason rather than each one individually – nak5120 Apr 15 '17 at 02:15
1

I'm going to give some intel about the project and why i do this. Consider the column 'Code'. Column code specifies uniquely a Person ( it can be more than 1 row for the same person. In that case, the Code will be the same) Consider, another column 'Class'. Class specifies a social class of a person. I used an SQL command to see if the same Person have diferent Social class along the file. What i noticed is that the results have several Social Classes for the same Person. The sintax i tried to write above are to delete the rows that have diferent SocialClasse per Person. – LuísA Apr 15 '17 at 02:24

How to delete a large number of rows from a .CSV file?

2 Answers2