I have a text file as shown below in the sample that I want to convert to a csv file (currently using Pandas).
The challenge is that I don't know in advance what the keys (column headers are) and in what order they are.
The final column order is not important.
Sample file:
name: john| dob: 10-06-1960| address: 4853 Radio Park Drive
name: jane| dob: 07-10-1973| address: 1537 Timbercrest Road| mobile: 706-289-6746
name: liam| dob: 12-08-1986| address: 4853 498 Fairmont Avenue| telephone number: 706-687-5021
name: chris| dob: 09-12-1965| address: 485 Green Avenue| state: California| Telephone Number: 510-855-5213
Desired output:
Name | dob | address | mobile | telephone number | state |
-----+------------+--------------------------+--------------+------------------+------------+
john | 10-06-1960 | 4853 Radio Park Drive | | | |
jane | 07-10-1973 | 1537 Timbercrest Road | 706-289-6746 | | |
liam | 12-08-1986 | 4853 498 Fairmont Avenue | | 706-687-5021 | |
chris| 09-12-1965 | 485 Green Avenue | | 510-855-5213 | California |
My Code:
import pandas as pd
df = pd.DataFrame()
file = open('D:\sample.log', 'r')
lines = file.readlines()
for line in lines:
pairs = line.split('|')
my_dict = {}
for pair in pairs:
key = pair.split(': ')[0].strip()
value = pair.split(': ')[1].strip()
my_dict[key] = value
df.append(my_dict, ignore_index=True)
This way of appending is very slow. How can I make this faster.
Or is there a much better solution (for example via a json string)?