-1

I have a text file which contains json in each row.

Sample Data: in file.text

{"id": "testid1","title": "testtitle1","link": "testlink1","description": "testdes2","entities": ["en1", "en2"]}
{"id": "testid2","title": "testtitle2","link": "testlink2","description": "testdes2","entities": [""]}
{"id": "testid1","title": "testtitle1","link": "testlink1","description": "testdesc","entities": ["en1", "en2", "en3"]}

Output required:

id  title   link    description entities__001   entities__002   entities__003
testid1 testtitle1  testlink1   testdes2    en1 en2 
testid2 testtitle2  testlink2   testdes2            
testid1 testtitle1  testlink1   testdesc    en1 en2 en3

Please suggest, how can I do the same in python?

I have tried to convert my file to csv online using https://json-csv.com/. However, It only supports up to 1 MB file in free account and my file size is around 200 MB. However, using this link I was able to covert it successfully with desired output.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Tajinder
  • 2,248
  • 4
  • 33
  • 54
  • 2
    Please show the code _you_ wrote to do this and describe the issues you're experiencing with it, if any. – ForceBru Aug 01 '19 at 15:07
  • maybe this question is what you need: [enter link description here](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) – elevadoatres Aug 01 '19 at 15:27

1 Answers1

1

First read the file and process the data (convert from string to json)

import json
with open(r".\data_file.txt") as f:
    data = f.readlines()
processed_data = [json.loads(line) for line in data]

then iterate over the documents to add new fields (to flatten the data). There are more efficient ways but this works.

import pandas as pd
for document in processed_data:
    for i in range(len(document["entities"])):
        document["entities_{}".format(i+1)] = document["entities"][i]
df = pd.DataFrame(processed_data)
#remove original column (if needed)
del df["entities"]

then just save as csv

df.to_csv(r"./out_folder/out_data.csv")