read file with pandas and create header

Question

I have several txt files that are formatted in this way

label1: value1 label2: string1 date: 2018-06-26 label3: value2 label4: string

I would like to read those files and create a database where I have headers and then values/strings which then I write to file. any help? regards

Use pandas read_csv to read the text files and then merge all these into one dataframe — min2bro, Jun 27 '18 at 11:41
dataset_cormat = pd.read_csv('cormat_out.txt', delimiter=" ", header=None, names=["shot", "user", "date",'seq','written by']), but it's not what I want as it cannot divide the data according to headers i set — bruvio, Jun 27 '18 at 11:54
What is the separator between columns? Is it just space like between column name and value, or that is tab? If it is different from just space, you might find here the answer https://stackoverflow.com/questions/38366494/how-to-read-text-files-key-value-pair-using-pandas (just change | to tab and = to : ) — Leonid Mednikov, Jun 27 '18 at 12:07

score 2 · Answer 1 · answered Jun 27 '18 at 12:20

Looks like you have a mapping between identifier labels and values. You can convert this into a dictionary via standard Python:

from io import StringIO

mystr = StringIO("""label1: value1 label2: string1 date: 2018-06-26 label3: value2 label4: string""")

# replace mystr with open('file.csv', 'r')
with mystr as fin:
    data = next(fin).strip().split()
    data_dict = {i[:-1]: j for i, j in zip(data[::2], data[1::2])}

print(data_dict)

{'date': '2018-06-26',
 'label1': 'value1',
 'label2': 'string1',
 'label3': 'value2',
 'label4': 'string'}

From here there are many options depending on the exact format you want to output your data, e.g. pandas, csv, etc. You need to provide more details for help with this step, but first you should investigate these options:

Hamid Mir · Accepted Answer · 2018-06-27T12:26:18.520

if data is exactly similar to this:

Age: 39 Name: Jack date: 2018-06-26 Region: NY Open: Yes
Age: 21 Name: Rose date: 2018-09-16 Region: TX Open: NO

You need to split texts based on the SPACES in the lines.

import pandas as pd

f=open('D:\\1.txt','r')
datalist=[]
dlabels=[]
for line in f:
    words = line.split(' ')
    words[-1] = words[-1][:-1]
    if len(dlabels)==0:
        for i in range(0,len(words),2):
            dlabels.append(words[i][:-1])
    tempL=[]
    for i in range(0,len(words),2):
        tempL.append(words[i+1])
    datalist.append(tempL)        
f.close()

data=pd.DataFrame(datalist,columns=dlabels)
print(data)

output:
Age Name date Region Open
0 39 Jack 2018-06-26 NY Yes
1 21 Rose 2018-09-16 TX NO

thanks @DataScienceStep that worked. I just had to edit the name of the label has it had a space. I am able to create dataFrame! — bruvio, Jun 27 '18 at 13:16

read file with pandas and create header

2 Answers2