Delete blank rows from CSV?

Question

I have a large csv file in which some rows are entirely blank. How do I use Python to delete all blank rows from the csv?

After all your suggestions, this is what I have so far

import csv

# open input csv for reading
inputCSV = open(r'C:\input.csv', 'rb')

# create output csv for writing
outputCSV = open(r'C:\OUTPUT.csv', 'wb')

# prepare output csv for appending
appendCSV = open(r'C:\OUTPUT.csv', 'ab')

# create reader object
cr = csv.reader(inputCSV, dialect = 'excel')

# create writer object
cw = csv.writer(outputCSV, dialect = 'excel')

# create writer object for append
ca = csv.writer(appendCSV, dialect = 'excel')

# add pre-defined fields
cw.writerow(['FIELD1_','FIELD2_','FIELD3_','FIELD4_'])

# delete existing field names in input CSV
# ???????????????????????????

# loop through input csv, check for blanks, and write all changes to append csv
for row in cr:
    if row or any(row) or any(field.strip() for field in row):
        ca.writerow(row)

# close files
inputCSV.close()
outputCSV.close()
appendCSV.close()

Is this ok or is there a better way to do this?

Only to see if using csv module had significant advantages over not using it. — debugged, Dec 23 '10 at 19:39
Using the csv module has one major advantage outlined by Laurence Gonsalves: its when the input file has blank lines embedded in quoted csv fields. — Paulo Scardine, Dec 23 '10 at 19:44
you mean like '','','','' ? How do I check for that as well? Also, how do I delete a particular row. Say first row or fifth row in the file. — debugged, Dec 23 '10 at 19:49
@debugged: **There is one major problem with the accepted answer: the files should be open in BINARY MODE (Python 2.X) otherwise on Windows the CR LF processing messes up the results.** — John Machin, Dec 23 '10 at 23:09

Laurence Gonsalves · Accepted Answer · 2022-02-07T20:02:15.377

38

Use the csv module:

import csv
...

with open(in_fnam, newline='') as in_file:
    with open(out_fnam, 'w', newline='') as out_file:
        writer = csv.writer(out_file)
        for row in csv.reader(in_file):
            if row:
                writer.writerow(row)

If you also need to remove rows where all of the fields are empty, change the if row: line to:

if any(row):

And if you also want to treat fields that consist of only whitespace as empty you can replace it with:

if any(field.strip() for field in row):

_{Note that in Python 2.x and earlier, the csv module expected binary files, and so you'd need to open your files with e 'b' flag. In 3.x, doing this will result in an error.}

edited Feb 07 '22 at 20:02

answered Dec 23 '10 at 18:43

Laurence Gonsalves

137,896
35
246
299

4

huh, the same code works without csv module if you use `if row.strip()` – nosklo Dec 23 '10 at 18:51
Thanks LG, that worked for me. Although I also have some lines w/ blank fields. Ex: ',,,,,\n'. How do I check for blank rows and rows w/ blank fields using csv module. – debugged Dec 23 '10 at 18:59
@debugged: `empty= all( len(c) == 0 for c in row )` – S.Lott Dec 23 '10 at 19:14
@nosklo: +1, csv module looks like an overkill. – Paulo Scardine Dec 23 '10 at 19:18
@debugged: also with `if ''.join(row).strip(): continue` – Paulo Scardine Dec 23 '10 at 19:19
4

@noskio @Paulo: it's possible to have blank lines that are part of a non-empty row in a csv file. eg: `'foo, "bar\n\nbaz", quux'` has an empty line but is a single csv row. – Laurence Gonsalves Dec 23 '10 at 19:31
@Laurence Gonsalves: +1 - indeed, in this case the csv module is justified. – Paulo Scardine Dec 23 '10 at 19:36
1

`any(row)` should work the same as `any(field for field in row)` – jfs Dec 23 '10 at 21:39
@J.F. Sebastian Thanks. I don't know why I wrote it that way. I think I was thinking ahead to the `.strip()` example. – Laurence Gonsalves Dec 23 '10 at 22:05
4

@Laurence Gonsalves: **There is one major problem with this answer: the files should be open in BINARY MODE (Python 2.X) otherwise on Windows the CR LF processing messes up the results** – John Machin Dec 23 '10 at 23:07
@John: Thanks. I didn't know this. I work primarily on Linux where 'b' is a no-op. I see that the csv module doc's example all use the binary flag (though I wish the explicitly said something about this) so I've updated the answer. BTW: making you comment bold wasn't really necessary. – Laurence Gonsalves Dec 23 '10 at 23:18
@Laurence Gonsalves: Python 2.7.1 docs "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference" is explicit enough, in my book. It should also say that it's strongly advised on portability grounds on platforms where it doesn't make a difference, and on forums with a multi-cultural audience. – John Machin Dec 23 '10 at 23:46
@John: I was looking for the word "binary", so I missed that note. I agree that that's explicit enough. – Laurence Gonsalves Dec 24 '10 at 00:48
I was using map to handle this, no way to do it if you're using res = map(process_row, reader)? I changed it to a for loop. – radtek Jan 09 '15 at 21:36
@radtek `map` can't add or delete elements, only modify elements. If you want a more functional approach than using a `for` loop you can either use `filter` or a comprehension. eg: `res = [process_row(row) for row in reader if row]` will remove empty rows and process the rest with `process_row`. – Laurence Gonsalves Jan 09 '15 at 22:56
@LaurenceGonsalves yep thats what I had done, I just used a for loop. – radtek Jan 11 '15 at 20:05
just omit using `input` and `output` if you tried to copy paste this solution. Bad practice to use system related words – Kots Nov 06 '18 at 15:06
Got error: for row in csv.reader(input): _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) I'm about done with programming. Been doing it for years and still literally everything doesn't work. I spend entire days just trying to do stupid things like this BECAUSE NOTHING EVER WORKS!!! – Wayne Filkins May 25 '19 at 02:03
@WayneFilkins I understand your frustration. The answer was written over 8 years ago, and things have changed since then. I have updated the answer to work with modern versions of Python. – Laurence Gonsalves May 27 '19 at 23:56
@Kots: Good point. I've changed the names of those variables in the answer. It's unfortunate that Python uses the word "input" for an almost entirely useless function. – Laurence Gonsalves May 27 '19 at 23:57
@JohnMachin It's interesting to note that using the binary flag with Python 3.x's `csv` module (at least since 3.5) results in an error. The docs now say "`csvfile` can be any object which supports the iterator protocol and returns a string each time its `__next__()` method is called". Using the binary flag results in an object that when iterated returns an instance of `bytes`, not `str`. I've updated the answer to 3.x, and moved the binary flag for 2.x to a footnote. – Laurence Gonsalves May 28 '19 at 00:02

score 11 · Answer 2 · answered Mar 29 '18 at 17:45

11

Surprised that nobody here mentioned pandas. Here is a possible solution.

import pandas as pd
df = pd.read_csv('input.csv')
df.to_csv('output.csv', index=False)

answered Mar 29 '18 at 17:45

Sagun Shrestha

1,188
10
23

Pandas is too big of a library to use just for this case. If you are already using pandas for other stuff then this might be viable option. – Aabesh Karmacharya Sep 02 '21 at 05:17

score 8 · Answer 3 · answered Jul 24 '20 at 11:55

8

Delete empty row from .csv file using python

    import csv
  ...


 with open('demo004.csv') as input, open('demo005.csv', 'w', newline='') as output:
     writer = csv.writer(output)
     for row in csv.reader(input):
         if any(field.strip() for field in row):
             writer.writerow(row)

Thankyou

answered Jul 24 '20 at 11:55

Dilip Kumar Choudhary

429
5
5

Thank you @Dilip Kumar Choudhary – just don't be user123 Feb 13 '21 at 16:15

score 5 · Answer 4 · edited Feb 05 '19 at 12:48

5

Doing it with pandas is very simple. Open your csv file with pandas:

import pandas as pd
df = pd.read_csv("example.csv")
#checking the number of empty rows in th csv file
print (df.isnull().sum())
#Droping the empty rows
modifiedDF = df.dropna()
#Saving it to the csv file 
modifiedDF.to_csv('modifiedExample.csv',index=False)

edited Feb 05 '19 at 12:48

Eric Aya

69,473
35
181
253

answered Feb 05 '19 at 12:23

Hamza Tayyab

79
2
3

Paulo Scardine · Answer 5 · 2010-12-23T20:09:16.610

3

You have to open a second file, write all non blank lines to it, delete the original file and rename the second file to the original name.

EDIT: a real blank line will be like '\n':

for line in f1.readlines():
    if line.strip() == '':
        continue
    f2.write(line)

a line with all blank fields would look like ',,,,,\n'. If you consider this a blank line:

for line in f1.readlines():
    if ''.join(line.split(',')).strip() == '':
        continue
    f2.write(line)

openning, closing, deleting and renaming the files is left as an exercise for you. (hint: import os, help(open), help(os.rename), help(os.unlink))

EDIT2: Laurence Gonsalves brought to my attention that a valid csv file could have blank lines embedded in quoted csv fields, like 1, 'this\n\nis tricky',123.45. In this case the csv module will take care of that for you. I'm sorry Laurence, your answer deserved to be accepted. The csv module will also address the concerns about a line like "","",""\n.

edited Dec 23 '10 at 20:09

answered Dec 23 '10 at 18:32

Paulo Scardine

73,447
11
124
153

Ok. How do I check if a line is blank? I'm looking for the code please – debugged Dec 23 '10 at 18:39
Thanks Paulo. I have both situations going on in my csv. Blank lines and all blank fields in the line like you mentioned above. Now, what is the advantage/disadvantage of using your method over csv module? – debugged Dec 23 '10 at 19:00
@debugged: the csv method as described in the upvoted answer will not remove a line with all blank fields. If you are only filtering blanks out, the csv module seems like an overkill. If you will make further manipulation, then go with the csv module as it will split each csv line into a convenient python list. – Paulo Scardine Dec 23 '10 at 19:13
1

Great stuff! Thanks for your answers Paulo. I appreciate your being specific and detailed. – debugged Dec 23 '10 at 19:29
1

This answer trades correctness for simplicity. Yes, using the csv module is slightly more complicated, but it actually works correctly for cases like newlines embedded in quoted fields. – Laurence Gonsalves Dec 23 '10 at 19:33
@debugged: see Laurence Gonsalves comment where he outlines a situation where the cvs module is justified. – Paulo Scardine Dec 23 '10 at 19:40
-1 Using readlines is the pox. So is totally ignoring the fact that the file is a CSV file. **This answer should NOT be accepted!!!** – John Machin Dec 23 '10 at 21:28
@John Machin: the answer was edited to reflect the correct solution. Still the original is valid if you know the cvs will not have string fields, no need to be mean. – Paulo Scardine Dec 23 '10 at 22:56
@Paulo Scardine: Quite apart from the fact that one should use the csv module for processing CSV files irrespective of what one thinks that one knows, and that the OP didn't say that he had no strings in his CSV file, putting up crap code like readlines was enough to qualify for the -1. – John Machin Dec 23 '10 at 23:56

vaibhav · Answer 6 · 2017-05-10T10:09:00.613

python code for remove blank line from csv file without create another file.

def ReadWriteconfig_file(file):

try:
    file_object = open(file, 'r')
    lines = csv.reader(file_object, delimiter=',', quotechar='"')
    flag = 0
    data=[]
    for line in lines:
        if line == []:
            flag =1
            continue
        else:
            data.append(line)
    file_object.close()
    if flag ==1: #if blank line is present in file
        file_object = open(file, 'w')
        for line in data:
            str1 = ','.join(line)
            file_object.write(str1+"\n")
        file_object.close() 
except Exception,e:
    print e

score 2 · Answer 7 · answered Dec 29 '19 at 20:39

2

Here is a solution using pandas that removes blank rows.

 import pandas as pd
 df = pd.read_csv('input.csv')
 df.dropna(axis=0, how='all',inplace=True)
 df.to_csv('output.csv', index=False)

answered Dec 29 '19 at 20:39

Aizayousaf

39
15

score 1 · Answer 8 · answered Jan 03 '19 at 11:06

I need to do this but not have a blank row written at the end of the CSV file like this code unfortunately does (which is also what Excel does if you Save-> .csv). My (even simpler) code using the CSV module does this too:

import csv

input = open("M51_csv_proc.csv", 'rb')
output = open("dumpFile.csv", 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    writer.writerow(row)
input.close()
output.close()

M51_csv_proc.csv has exactly 125 rows; the program always outputs 126 rows, the last one being blank.

I've been through all these threads any nothing seems to change this behaviour.

to avoid the 'close' lines, it is preferable to use (on reads): `with open(filename) as in_file:` to avoid the addition of redundent \r or \n in writes `with open(filename, 'w+', newline='') as out_file:` — DarkLight, Nov 05 '19 at 14:07

score 0 · Answer 9 · edited Oct 25 '14 at 06:06

0

In this script all the CR / CRLF are removed from a CSV file then has lines like this:

"My name";mail@mail.com;"This is a comment.
Thanks!"

Execute the script https://github.com/eoconsulting/lr2excelcsv/blob/master/lr2excelcsv.py

Result (in Excel CSV format):

"My name",mail@mail.com,"This is a comment. Thanks!"

edited Oct 25 '14 at 06:06

Unihedron

10,902
13
62
72

answered Sep 10 '12 at 15:37

Mariano Ruiz

4,314
2
38
34

score 0 · Answer 10 · answered Mar 08 '21 at 12:14

Replace the PATH_TO_YOUR_CSV with your

import pandas as pd

df = pd.read_csv('PATH_TO_YOUR_CSV')
new_df = df.dropna()
df.dropna().to_csv('output.csv', index=False)

or in-line:

import pandas as pd

pd.read_csv('data.csv').dropna().to_csv('output.csv', index=False)

score 0 · Answer 11 · edited Jun 11 '21 at 13:26

0

I had the same, problem.

I converted the .csv file to a dataframe and after that I converted the dataframe back to the .csv file.

The initial .csv file with the blank lines was the 'csv_file_logger2.csv' .

So, i do the following process

import csv
import pandas as pd
df=pd.read_csv('csv_file_logger2.csv')

df.to_csv('out2.csv',index = False)

edited Jun 11 '21 at 13:26

rawrex

4,044
2
8
24

answered Jun 11 '21 at 10:38

Thrasos

1

Delete blank rows from CSV?

11 Answers11

Linked

Related