I am new to Python. I have thousands of CSV files, in which, there is a group of text that comes after the numeric data are logged and I would like to remove all the rows onwards that begin with text. For example:
col 1 col 2 col 3
--------------------
10 20 30
--------------------
45 34 56
--------------------
Start 8837sec 9items
--------------------
Total 6342sec 755items
The good thing is that the text for all the csv files begin with "Start" in column1. I would prefer removing all the rows afterwards including the row that says "Start".
Here is what I have so far:
import csv, os, re, sys
fileList = []
pattern = [r"\b(Start).*", r"\b(Total).*"]
for file in files:
fullname = os.path.join(cwd, file)
if not os.path.isdir(fullname) and not os.path.islink(fullname):
fileList.append(fullname)
for file in fileList:
try:
ifile = open(file, "r")
except IOError:
sys.stderr.write("File %s not found! Please check the filename." %(file))
sys.exit()
else:
with ifile:
reader = csv.reader(ifile)
writer = csv.writer(ifile)
rowList = []
for row in reader:
rowList.append((", ".join(row)))
for pattern in word_pattern:
if not (re.match(pattern, rowList)
writer.writerow(elem)
After running this script, it gives me blank csv file. Any idea what to change?