1

I've noticed that several CSVs that come my way have random 'NUL' values placed through out the file. I noticed this because when I import the file into a database using a SSIS package I built those files throw a no column delimiter found error. I'm thinking about writing a python script to clean these files up, but I can't find a solution to this problem in Python. How would I use Python to remove these NUL characters?

I would include a picture, but I don't have enough reputation to include one.

Ex. "123456","Brown, Jim","","?NUL","",False,"8/16/2014 12:00:00 AM",""NUL,""InboNULund"

lsward
  • 45
  • 1
  • 1
  • 10
  • Are you saying that there is a `\x00` characters as CSV elements? i.e. `foo,bar,\x00,tree`? – theorifice Aug 04 '16 at 15:34
  • @theorifice Yes. But there are also `\x00` characters inserted in strings i.e. `foo\x00bar, cheese, pizza, y\x00ellow` – lsward Aug 04 '16 at 15:45
  • Have you tried looping through the lines in the file and using the `string.replace` method? – haliphax Aug 04 '16 at 15:58
  • @haliphax I have, but not all the `\x00` characters are embedded in strings. It doesn't catch all of them. – lsward Aug 04 '16 at 17:16
  • What is generating the data? It seems the issue is that the CSV generator is providing garbled data. – theorifice Aug 04 '16 at 18:27
  • @theorifice I've asked the people that send me the reports. They are pulling reports from a couple different websites. I've come to the conclusion that the reporting tool is an afterthought on some of these sites, so there is a good possibility that I'm getting garbled data. Unfortunately, it's out of my hands and I'm just doing the best I can with what I have. – lsward Aug 04 '16 at 18:46
  • I don't see how, if you're treating an entire document as a string (or as a list of strings), that "not all the `\x00` characters are embedded in strings"... – haliphax Sep 02 '16 at 16:02

1 Answers1

3

I don't know how I didn't find this answer in my search, but this solution worked. It's weird that it worked because I tried the string replace method and it didn't seem to catch all of them but I think the through answer provided by @JohnMachin in this post really laid the ground work for me solving the problem. He provides a comprehensive way to investigate the problem and I suggest taking a look at it if you are having a similar issue. Python CSV error: line contains NULL byte

Community
  • 1
  • 1
lsward
  • 45
  • 1
  • 1
  • 10