I have a text file that looks like this:
Id Number: 12345678
Location: 1234561791234567090-8.9
Street: 999 Street AVE
Buyer: john doe
Id Number: 12345688
Location: 3582561791254567090-8.9
Street: 123 Street AVE
Buyer: Jane doe @ buyer % LLC
Id Number: 12345689
Location: 8542561791254567090-8.9
Street: 854 Street AVE
Buyer: Jake and Bob: Owner%LLC: Inc
I'd like the file to look like this:
Id Number | Location | Street | Buyer |
---|---|---|---|
12345678 | 1234561791234567090-8.9 | 999 Street AVE | john doe |
12345688 | 3582561791254567090-8.9 | 123 Street AVE | Jane doe @ buyer % LLC |
12345689 | 8542561791254567090-8.9 | 854 Street AVE | Jake and Bob: Owner%LLC: Inc |
I have tried the following:
# 1 Read text file and ignore bad lines (lines with extra colons thus reading as extra fields).
tr = pd.read_csv('C:\\File Path\\test.txt', sep=':', header=None, error_bad_lines=False)
# 2 Convert into a dataframe/pivot table.
ndf = pd.DataFrame(tr.pivot(index=None, columns=0, values=1))
# 3 Clean up the pivot table to remove NaNs and reset the index (line by line).
nf2 = ndf.apply(lambda x: x.dropna().reset_index(drop=True))
Here is where got the last line (#3): https://stackoverflow.com/a/62481057/10448224
When I do the above and export to CSV the headers are arranged like the following:
(index) | Street | Buyer | Id Number | Location |
---|
The data is filled in nicely but at some point the Buyer field becomes inaccurate but the rest of the fields are accurate through the entire DF.
My guesses:
When I run #1 part of my script I get the following errors 507 times:
b'Skipping line 500: expected 2 fields, saw 3\nSkipping line 728: expected 2 fields, saw 3\
At the tail end of the new DF I am missing exactly 507 entries for the Byer field. So I think when I drop my bad lines, the field is pushing my data up.
Pain Points:
The Buyer field will sometimes have extra colons and other odd characters. So when I try to use a colon as a delimiter I run into problems.
I am new to Python and I am very new to using functions. I primarily use Pandas to manipulate data at a somewhat basic level. So in the words of the great Michael Scott: "Explain it to me like I'm five." Many many thanks to anyone willing to help.