Parse Large .csv File Rows with Python

Question

A large .csv file has a typical row with approx 3000 data elements separated by commas. Approximately 50% of this data is fluff(non-value added data) and can be removed. How can I remove this fluff with multiple string removals? I am new to Python.

I can read the data. I am unable to change the data. Variable x in the code below would be the changed string by row.

with open('som_w.csv','r+') as file:
    reader = csv.reader(file, delimiter=',')
    for i, row in enumerate(reader):
        print(row)
        print(i+1)

writer = csv.writer(file, delimiter=',')
for row in writer:
    x = re.sub(r'<.*?>',"",writer)
    print(x)

file.close()

The current error is the csv.writer is not iterable. I believe I'm heading down the wrong path.

score 0 · Answer 1 · answered Sep 03 '19 at 03:18

0

Take a look at comments. I think it should help.

with open('som_w.csv','r+') as file:
    reader = csv.reader(file, delimiter=',')
    for i, row in enumerate(reader):
        print(row)
        print(i+1)

writer = csv.writer(file, delimiter=',') # isn't `file` out of scope?
for row in writer:
    x = re.sub(r'<.*?>',"",writer)
    print(x)

file.close() # while using `with`, it's unnecessary to close file.

answered Sep 03 '19 at 03:18

Yeheshuah

1,216
1
13
28

Would i have to reopen "file" or place the re.sub() some place within the "With" in order to begin changing each row? – SpaetzleKing Sep 03 '19 at 03:35
Do you want to write result to the same file? 'som_w.csv' in your example – Yeheshuah Nov 26 '19 at 05:09

score 0 · Answer 2 · edited Sep 03 '19 at 04:49

0

Look at this post, there is an example for a function which replace all lines with help of regular expression.

Then try this:

import fileinput
import sys

def replaceAll(file, searchExp, replaceExp):
    with fileinput.input(file) as f:
        for line in f:
            if searchExp in line:
                line = line.replace(searchExp, replaceExp)
            sys.stdout.write(line)

replaceAll('som_w.csv', r'<.*?>', "")

edited Sep 03 '19 at 04:49

Yeheshuah

1,216
1
13
28

answered Sep 03 '19 at 03:38

micharaze

957
8
25

I somewhat understand. Since replaceAll has a regular expressionin the second argument, I assumed I needed to import re. In either case (with or without "import re") the program appears to run without errors. But when I open som_w.csv the contents look the same. – SpaetzleKing Sep 05 '19 at 00:11

Parse Large .csv File Rows with Python

2 Answers2