I am trying to read a large log file, which has been parsed using different delimiters (legacy changes).
This code works
import os, subprocess, time, re
import pandas as pd
for root, dirs, files in os.walk('.', topdown=True):
for file in files:
df = pd.read_csv(file, sep='[,|;: \t]+', header=None, engine='python', skipinitialspace=True)
for index, row in df.iterrows():
print(row[0], row[1])
This works well for the following data
user1@email.com address1
user2@email.com;address2
user3@email.com,address3
user4@email.com;;address4
user5@email.com,,address5
Issue #1: the following row in the input file will break the code. I wish for this to be parsed into 2 columns (not 3)
user6@email.com,,address;6
Issue #2: I wish to replace all single and double quotes in address, but neither of the following seem to work.
df[1]=df[1].str.replace('"','DQUOTES')
df.replace('"', 'DQUOTES', regex=True)
Pls help!