Python: How to parse a CSV file containing NULL values?

Question

I have a csv file containing binary fields, and when I read it by csv.reader(f), I get

containing NULL values.

I've tried all kinds of solutions on the web such as this, this and this but still, the same error pops up. I managed to read it line by line and separate it by ,, but some fields have also , within it, so I'm wondering how I can read and extract the columns? An example of a row is as bellow:

212344408,"cp233.net","net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","957596578@QQ.COM","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N

I would appreciate any suggestions.

Is this Python 2 or 3? Have you tried the `reader = csv.reader(line.translate({0: None}) for line in f)` approach (e.g. simply removing the NUL bytes)? — Martijn Pieters, Apr 06 '17 at 20:03
Possible duplicate of ["Line contains NULL byte" in CSV reader (Python)](http://stackoverflow.com/questions/7894856/line-contains-null-byte-in-csv-reader-python) — Satish Prakash Garg, Apr 06 '17 at 20:06
How do you need the null byte handled? Ignored or handled as `None` or something? — ivan_pozdeev, Apr 06 '17 at 20:43

score 3 · Answer 1 · answered Apr 09 '17 at 00:24

3

Pandas worked great for my case and could retrieve the file and skip those rows that were broken because of weird characters.

import pandas as pd

df = pandas.read_csv(filename, verbose =True , warn_bad_lines = True, error_bad_lines=False, names = header)

answered Apr 09 '17 at 00:24

Alex

1,914
6
26
47

1

Pandas is the way to go. the `df` above will create a dataframe which is a much more permissive structure. So you should not run into the same errors as when using the csv module. – krishnab Apr 09 '17 at 00:32

score 0 · Answer 2 · answered Apr 09 '17 at 00:32

This works fine on your example, I even replaced one string with NULL and it handled it just fine.

test.csv:

212344408,"cp233.net","net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","957596578@QQ.COM","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N
212344408,NULL,"net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","957596578@QQ.COM","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM","WANG MIN CHUN","WANG MIN CHUN","957596578@QQ.COM",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N

code:

import csv
with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

If that's not the behaviour you're experiencing could you provide a line where it fails?

Python: How to parse a CSV file containing NULL values?

2 Answers2