29

I have a large csv file in which some rows are entirely blank. How do I use Python to delete all blank rows from the csv?

After all your suggestions, this is what I have so far

import csv

# open input csv for reading
inputCSV = open(r'C:\input.csv', 'rb')

# create output csv for writing
outputCSV = open(r'C:\OUTPUT.csv', 'wb')

# prepare output csv for appending
appendCSV = open(r'C:\OUTPUT.csv', 'ab')

# create reader object
cr = csv.reader(inputCSV, dialect = 'excel')

# create writer object
cw = csv.writer(outputCSV, dialect = 'excel')

# create writer object for append
ca = csv.writer(appendCSV, dialect = 'excel')

# add pre-defined fields
cw.writerow(['FIELD1_','FIELD2_','FIELD3_','FIELD4_'])

# delete existing field names in input CSV
# ???????????????????????????

# loop through input csv, check for blanks, and write all changes to append csv
for row in cr:
    if row or any(row) or any(field.strip() for field in row):
        ca.writerow(row)

# close files
inputCSV.close()
outputCSV.close()
appendCSV.close()

Is this ok or is there a better way to do this?

Unihedron
  • 10,902
  • 13
  • 62
  • 72
debugged
  • 317
  • 1
  • 3
  • 8
  • Why is the fact that the file's a CSV file relevant? – Robert Rossney Dec 23 '10 at 19:37
  • Only to see if using csv module had significant advantages over not using it. – debugged Dec 23 '10 at 19:39
  • Using the csv module has one major advantage outlined by Laurence Gonsalves: its when the input file has blank lines embedded in quoted csv fields. – Paulo Scardine Dec 23 '10 at 19:44
  • you mean like '','','','' ? How do I check for that as well? Also, how do I delete a particular row. Say first row or fifth row in the file. – debugged Dec 23 '10 at 19:49
  • @debugged: **There is one major problem with the accepted answer: the files should be open in BINARY MODE (Python 2.X) otherwise on Windows the CR LF processing messes up the results.** – John Machin Dec 23 '10 at 23:09

11 Answers11

38

Use the csv module:

import csv
...

with open(in_fnam, newline='') as in_file:
    with open(out_fnam, 'w', newline='') as out_file:
        writer = csv.writer(out_file)
        for row in csv.reader(in_file):
            if row:
                writer.writerow(row)

If you also need to remove rows where all of the fields are empty, change the if row: line to:

if any(row):

And if you also want to treat fields that consist of only whitespace as empty you can replace it with:

if any(field.strip() for field in row):

Note that in Python 2.x and earlier, the csv module expected binary files, and so you'd need to open your files with e 'b' flag. In 3.x, doing this will result in an error.

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • 4
    huh, the same code works without csv module if you use `if row.strip()` – nosklo Dec 23 '10 at 18:51
  • Thanks LG, that worked for me. Although I also have some lines w/ blank fields. Ex: ',,,,,\n'. How do I check for blank rows and rows w/ blank fields using csv module. – debugged Dec 23 '10 at 18:59
  • @debugged: `empty= all( len(c) == 0 for c in row )` – S.Lott Dec 23 '10 at 19:14
  • @nosklo: +1, csv module looks like an overkill. – Paulo Scardine Dec 23 '10 at 19:18
  • @debugged: also with `if ''.join(row).strip(): continue` – Paulo Scardine Dec 23 '10 at 19:19
  • 4
    @noskio @Paulo: it's possible to have blank lines that are part of a non-empty row in a csv file. eg: `'foo, "bar\n\nbaz", quux'` has an empty line but is a single csv row. – Laurence Gonsalves Dec 23 '10 at 19:31
  • @Laurence Gonsalves: +1 - indeed, in this case the csv module is justified. – Paulo Scardine Dec 23 '10 at 19:36
  • 1
    `any(row)` should work the same as `any(field for field in row)` – jfs Dec 23 '10 at 21:39
  • @J.F. Sebastian Thanks. I don't know why I wrote it that way. I think I was thinking ahead to the `.strip()` example. – Laurence Gonsalves Dec 23 '10 at 22:05
  • 4
    @Laurence Gonsalves: **There is one major problem with this answer: the files should be open in BINARY MODE (Python 2.X) otherwise on Windows the CR LF processing messes up the results** – John Machin Dec 23 '10 at 23:07
  • @John: Thanks. I didn't know this. I work primarily on Linux where 'b' is a no-op. I see that the csv module doc's example all use the binary flag (though I wish the explicitly said something about this) so I've updated the answer. BTW: making you comment bold wasn't really necessary. – Laurence Gonsalves Dec 23 '10 at 23:18
  • @Laurence Gonsalves: Python 2.7.1 docs "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference" is explicit enough, in my book. It should also say that it's strongly advised on portability grounds on platforms where it doesn't make a difference, and on forums with a multi-cultural audience. – John Machin Dec 23 '10 at 23:46
  • @John: I was looking for the word "binary", so I missed that note. I agree that that's explicit enough. – Laurence Gonsalves Dec 24 '10 at 00:48
  • I was using map to handle this, no way to do it if you're using res = map(process_row, reader)? I changed it to a for loop. – radtek Jan 09 '15 at 21:36
  • @radtek `map` can't add or delete elements, only modify elements. If you want a more functional approach than using a `for` loop you can either use `filter` or a comprehension. eg: `res = [process_row(row) for row in reader if row]` will remove empty rows and process the rest with `process_row`. – Laurence Gonsalves Jan 09 '15 at 22:56
  • @LaurenceGonsalves yep thats what I had done, I just used a for loop. – radtek Jan 11 '15 at 20:05
  • just omit using `input` and `output` if you tried to copy paste this solution. Bad practice to use system related words – Kots Nov 06 '18 at 15:06
  • Got error: for row in csv.reader(input): _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) I'm about done with programming. Been doing it for years and still literally everything doesn't work. I spend entire days just trying to do stupid things like this BECAUSE NOTHING EVER WORKS!!! – Wayne Filkins May 25 '19 at 02:03
  • @WayneFilkins I understand your frustration. The answer was written over 8 years ago, and things have changed since then. I have updated the answer to work with modern versions of Python. – Laurence Gonsalves May 27 '19 at 23:56
  • @Kots: Good point. I've changed the names of those variables in the answer. It's unfortunate that Python uses the word "input" for an almost entirely useless function. – Laurence Gonsalves May 27 '19 at 23:57
  • @JohnMachin It's interesting to note that using the binary flag with Python 3.x's `csv` module (at least since 3.5) results in an error. The docs now say "`csvfile` can be any object which supports the iterator protocol and returns a string each time its `__next__()` method is called". Using the binary flag results in an object that when iterated returns an instance of `bytes`, not `str`. I've updated the answer to 3.x, and moved the binary flag for 2.x to a footnote. – Laurence Gonsalves May 28 '19 at 00:02
11

Surprised that nobody here mentioned pandas. Here is a possible solution.

import pandas as pd
df = pd.read_csv('input.csv')
df.to_csv('output.csv', index=False)
Sagun Shrestha
  • 1,188
  • 10
  • 23
  • Pandas is too big of a library to use just for this case. If you are already using pandas for other stuff then this might be viable option. – Aabesh Karmacharya Sep 02 '21 at 05:17
8

Delete empty row from .csv file using python

    import csv
  ...


 with open('demo004.csv') as input, open('demo005.csv', 'w', newline='') as output:
     writer = csv.writer(output)
     for row in csv.reader(input):
         if any(field.strip() for field in row):
             writer.writerow(row)

Thankyou

5

Doing it with pandas is very simple. Open your csv file with pandas:

import pandas as pd
df = pd.read_csv("example.csv")
#checking the number of empty rows in th csv file
print (df.isnull().sum())
#Droping the empty rows
modifiedDF = df.dropna()
#Saving it to the csv file 
modifiedDF.to_csv('modifiedExample.csv',index=False)
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Hamza Tayyab
  • 79
  • 2
  • 3
3

You have to open a second file, write all non blank lines to it, delete the original file and rename the second file to the original name.

EDIT: a real blank line will be like '\n':

for line in f1.readlines():
    if line.strip() == '':
        continue
    f2.write(line)

a line with all blank fields would look like ',,,,,\n'. If you consider this a blank line:

for line in f1.readlines():
    if ''.join(line.split(',')).strip() == '':
        continue
    f2.write(line)

openning, closing, deleting and renaming the files is left as an exercise for you. (hint: import os, help(open), help(os.rename), help(os.unlink))

EDIT2: Laurence Gonsalves brought to my attention that a valid csv file could have blank lines embedded in quoted csv fields, like 1, 'this\n\nis tricky',123.45. In this case the csv module will take care of that for you. I'm sorry Laurence, your answer deserved to be accepted. The csv module will also address the concerns about a line like "","",""\n.

Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • Ok. How do I check if a line is blank? I'm looking for the code please – debugged Dec 23 '10 at 18:39
  • Thanks Paulo. I have both situations going on in my csv. Blank lines and all blank fields in the line like you mentioned above. Now, what is the advantage/disadvantage of using your method over csv module? – debugged Dec 23 '10 at 19:00
  • @debugged: the csv method as described in the upvoted answer will not remove a line with all blank fields. If you are only filtering blanks out, the csv module seems like an overkill. If you will make further manipulation, then go with the csv module as it will split each csv line into a convenient python list. – Paulo Scardine Dec 23 '10 at 19:13
  • 1
    Great stuff! Thanks for your answers Paulo. I appreciate your being specific and detailed. – debugged Dec 23 '10 at 19:29
  • 1
    This answer trades correctness for simplicity. Yes, using the csv module is slightly more complicated, but it actually works correctly for cases like newlines embedded in quoted fields. – Laurence Gonsalves Dec 23 '10 at 19:33
  • @debugged: see Laurence Gonsalves comment where he outlines a situation where the cvs module is justified. – Paulo Scardine Dec 23 '10 at 19:40
  • -1 Using readlines is the pox. So is totally ignoring the fact that the file is a CSV file. **This answer should NOT be accepted!!!** – John Machin Dec 23 '10 at 21:28
  • @John Machin: the answer was edited to reflect the correct solution. Still the original is valid if you know the cvs will not have string fields, no need to be mean. – Paulo Scardine Dec 23 '10 at 22:56
  • @Paulo Scardine: Quite apart from the fact that one should use the csv module for processing CSV files irrespective of what one thinks that one knows, and that the OP didn't say that he had no strings in his CSV file, putting up crap code like readlines was enough to qualify for the -1. – John Machin Dec 23 '10 at 23:56
2

python code for remove blank line from csv file without create another file.

def ReadWriteconfig_file(file):

try:
    file_object = open(file, 'r')
    lines = csv.reader(file_object, delimiter=',', quotechar='"')
    flag = 0
    data=[]
    for line in lines:
        if line == []:
            flag =1
            continue
        else:
            data.append(line)
    file_object.close()
    if flag ==1: #if blank line is present in file
        file_object = open(file, 'w')
        for line in data:
            str1 = ','.join(line)
            file_object.write(str1+"\n")
        file_object.close() 
except Exception,e:
    print e
vaibhav
  • 11
  • 6
2

Here is a solution using pandas that removes blank rows.

 import pandas as pd
 df = pd.read_csv('input.csv')
 df.dropna(axis=0, how='all',inplace=True)
 df.to_csv('output.csv', index=False)
Aizayousaf
  • 39
  • 15
1

I need to do this but not have a blank row written at the end of the CSV file like this code unfortunately does (which is also what Excel does if you Save-> .csv). My (even simpler) code using the CSV module does this too:

import csv

input = open("M51_csv_proc.csv", 'rb')
output = open("dumpFile.csv", 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    writer.writerow(row)
input.close()
output.close() 

M51_csv_proc.csv has exactly 125 rows; the program always outputs 126 rows, the last one being blank.

I've been through all these threads any nothing seems to change this behaviour.

  • to avoid the 'close' lines, it is preferable to use (on reads): `with open(filename) as in_file:` to avoid the addition of redundent \r or \n in writes `with open(filename, 'w+', newline='') as out_file:` – DarkLight Nov 05 '19 at 14:07
0

In this script all the CR / CRLF are removed from a CSV file then has lines like this:

"My name";mail@mail.com;"This is a comment.
Thanks!"

Execute the script https://github.com/eoconsulting/lr2excelcsv/blob/master/lr2excelcsv.py

Result (in Excel CSV format):

"My name",mail@mail.com,"This is a comment. Thanks!"
Unihedron
  • 10,902
  • 13
  • 62
  • 72
Mariano Ruiz
  • 4,314
  • 2
  • 38
  • 34
0

Replace the PATH_TO_YOUR_CSV with your

import pandas as pd

df = pd.read_csv('PATH_TO_YOUR_CSV')
new_df = df.dropna()
df.dropna().to_csv('output.csv', index=False)

or in-line:

import pandas as pd

pd.read_csv('data.csv').dropna().to_csv('output.csv', index=False)
Andrea Ciccotta
  • 598
  • 6
  • 16
0

I had the same, problem.

I converted the .csv file to a dataframe and after that I converted the dataframe back to the .csv file.

The initial .csv file with the blank lines was the 'csv_file_logger2.csv' .

So, i do the following process

import csv
import pandas as pd
df=pd.read_csv('csv_file_logger2.csv')

df.to_csv('out2.csv',index = False)
rawrex
  • 4,044
  • 2
  • 8
  • 24