-1

I have a regular expression that I want to apply to each line of a CSV file.

Here's the function which basically removes all comma's encountered before a single digit number. The function is working perfectly fine for the string.

Input : text = "52A, XYZ Street, ABC District, 2, M, Brown
Function : re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Output : '52A XYZ Street ABC District, 2, M, Brown'

However, I have a CSV file containing hundreds of such lines. For instance

1, 5273249, 1061/72, 150-CF, S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
1, 5272849, 1063/36, 150-AS, S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
1, 5274549, 10626/12, 150-RT, S/O:XVZ, 1, ABX, 45, 0, Son, 9214

I tried to read it using CSV reader and apply the function but unfortunately, it's not producing any output. What did I do wrong here:

def myFunction(text):
    return re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)

import csv
with open('temp1.csv', 'r') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        l = ','.join(row)    
        myFunction(l)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Rohit Girdhar
  • 353
  • 3
  • 8
  • 26
  • 1
    what output do you expect ? nothing is printed or written in a file in this code – PRMoureu May 21 '18 at 16:26
  • Does `myFunction` return anything? If so, the first thing you should do is assign it to a variable or else the output would be lost. – Matias Cicero May 21 '18 at 16:27
  • @MatiasCicero Yes, myFunction is returning the correct output as a standalone function. But not when I'm calling it inside the read CSV method. For some reason the output is getting lost. – Rohit Girdhar May 21 '18 at 16:29
  • @PRMoureu I expect the commas to get removed before the second occurence of 1 right after S/0:XYZ – Rohit Girdhar May 21 '18 at 16:30
  • @RohitGirdhar I'm not following your comment. If your function *is returning* the output, why are you not storing it somewhere? Or at the very least, consuming it. – Matias Cicero May 21 '18 at 16:33
  • ok you've been clear enough with the goal in your post, but what kind of output do you need exactly ? print in a shell or written in a file ? you only execute the `myFunction` without handling the returned values – PRMoureu May 21 '18 at 16:33
  • @PRMoureu I need to write back to the same file I'm reading from. – Rohit Girdhar May 21 '18 at 16:35
  • @MatiasCicero You are correct. Storing it in a variable does get me the output. But I didn't quite get why if I don't store it in a var, does it not return the output, even though I am using the return statement, – Rohit Girdhar May 21 '18 at 16:40
  • 1
    @RohitGirdhar It was always returning the value, but if you don't handle it in any way it'd just be lost on the void. The output won't get written on a file by itself. – Matias Cicero May 21 '18 at 16:46

1 Answers1

1

There's a handy module called fileinput that allows you to easily edit files line by line.

Take a look at:

import re
import fileinput

# For each line
for line in fileinput.input(files='example.csv', inplace=True, backup='.bak'):
    # Replace it with the regex you provided
    print(re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', line), end='')

Output:

Before:
1,  5273249, 1061/72, 150-CF, S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
1,  5272849, 1063/36, 150-AS, S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
1,  5274549, 10626/12, 150-RT, S/O:XVZ, 1, ABX, 45, 0, Son, 9214

After:
1,  5273249, 1061/72, 150-CF S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
1,  5272849, 1063/36, 150-AS S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
1,  5274549, 10626/12, 150-RT S/O:XVZ, 1, ABX, 45, 0, Son, 9214

This also creates a handy example.csv.bak file for you in case you need it!

Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • Never heard of file input before, but it does the job perfectly. This saved me a lot of time. Appreciate your help! – Rohit Girdhar May 21 '18 at 18:28
  • This is **not relevant to the problem**. The example *only incidentally* showed a fix, by `print`ing values rather than `return`ing them. However, it would usually be better to *accumulate* values from each function call. I added duplicate links to the question to show these techniques. – Karl Knechtel Aug 09 '22 at 01:52