0

Recently I was dealing with CSV files and tried to replace the NULL bytes in the CSV file to empty strings to make the CSV reader work. I referred to this answer but decided to do it in a different way.

The original solution is like this:

with open(file) as f:
    reader  = csv.reader(x.replace('\0','') for x in f)
    print([x for x in reader])

But I decide to do it like this:

with open(file) as f:
    for line in f:
        line.replace('\0','')
    f.seek(0)
    reader  = csv.reader(f)
    print([x for x in reader])

And my approach seemed not to work as the original one, I wonder why is that? Thank you for your time!

  • 4
    `line.replace()` returns a new string, it doesn't modify the file object. – Barmar Oct 07 '21 at 20:19
  • 1
    plus, `f.seek(0)` will reset the file to the start and will read the original file all over again. – quamrana Oct 07 '21 at 20:21
  • 1
    @quamrana He was expecting that to read the file as modified by `replace()`. – Barmar Oct 07 '21 at 20:24
  • 1
    Of course it doesn't do anything. On each iteration, `line.replace('\0','')` creates a new string, which is simply discarded. You do nothing with the result. – juanpa.arrivillaga Oct 07 '21 at 20:25
  • 2
    Also note, `[x for x in reader]` should just be `list(reader)`, there's **no point** in using a list comprehension here, which exists to be able to conveniently express mapping/filtering operations. So `[x for x in whatever]` should always just be `list(whatever)`, same as `{k:v for k,v in whatever}` should always just be `dict(whatever)` – juanpa.arrivillaga Oct 07 '21 at 20:26
  • So, now that the replace always returns a new string, then why the first implementation worked? – DominiqueNobody Oct 07 '21 at 20:35
  • 1
    Sounds like an [X-Y Problem](https://xyproblem.info/). Why are there nulls in your CSV in the first place? Could it be the file is UTF-16-encoded and you simply need to open the file with the right encoding? – Mark Tolonen Oct 07 '21 at 20:37
  • @MarkTolonen Unfortunately not, our TA just randomly generated some bytes and enforce to convert to txt and asked us to csv-fy it :( – DominiqueNobody Oct 07 '21 at 20:43
  • 1
    So it is an X-Y problem. The problem is the TA – Mark Tolonen Oct 07 '21 at 20:45
  • 2
    Because in the first implementation, you are iterating over a generator of new strings. – juanpa.arrivillaga Oct 07 '21 at 21:07

1 Answers1

1

Take a look at the official doc of the replace function in python:

str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

In your implementation, you are calling replace but not capturing the returned replaced line anywhere.

You could instead, replace the whole file and store it in another variable or, if it is large, perform your operation inside the for loop itself.

However, the reference implementation you showed before looks better: It uses a generator that will yield replaced lines as you need them, you should stick with that.

Rodrigo Rodrigues
  • 7,545
  • 1
  • 24
  • 36