I have a python list with each string being one of the following 4 possible options like this (of course the names would be different):
Mr: Smith\n
Mr: Smith; John\n
Smith\n
Smith; John\n
I want these to be corrected to:
Mr,Smith,fname\n
Mr,Smith,John\n
title,Smith,fname\n
title,Smith,John\n
Easy enough to do with 4 re.sub():
with open ("path/to/file",'r') as fileset:
dataset = fileset.readlines()
for item in dataset:
dataset = [item.strip() for item in dataset] #removes some misc. white noise
item = re.sub((.*):\W(.*);\W,r'\g<1>'+','+r'\g<2>'+',',item)
item = re.sub((.*);\W(.*),'title,'+r'\g<1>'+','+r'\g<2>',item)
item = re.sub((.*):\W(.*),r'\g<1>'+','+r'\g<2>'+',fname',item)
item = re.sub((*.),'title,'+r'\g<1>'+',fname',item)
While this is fine for the dataset I'm using, I want to be more efficient.
Is there a single operation that can simplify this process?
Please pardon if I forgot a quote or some such; I'm not at my workstation now and I'm aware I've stripped the newline (\n
).
Thank you,