Fastest way to convert large text file (10GB) containing lists of nested dictonaries to Pandas DataFrame

Asked May 13 '19 at 09:31

Active May 13 '19 at 09:52

Viewed 145 times

I have large text files containing lists of nested dictionaries of the below sample format:

[ {key1:value1, key2:value2, key3: { key3_key1:key3_value1,
        key3_key2:key3_value2
      } ]

Trying to convert each list as pandas data frame row of the following column format

columns = [key1, key2, key3_key1, key3_key2]

What is the fastest way to achieve a complete data frame?

edited May 13 '19 at 09:52

jps

asked May 13 '19 at 09:31

Srinath

1

Is it a valid json file? – micric May 13 '19 at 09:46
To do it you need either ```.apply``` or ```lambda```, here is a post that may help you https://stackoverflow.com/questions/51388201/fastest-way-to-create-a-pandas-column-conditionally – Mark May 13 '19 at 09:50
@micric, its not a valid json file. Its a text file containing lists of nested dictionaries. json.loads doesnt work on it. – Srinath May 13 '19 at 12:56
Thanks @Mark, I can read the text file line after line and append rows to a dataframe. Isn't that too slow? Should I be taking chunks to different cores and then apply the row append function? – Srinath May 13 '19 at 12:59
I would suggest reading it into a python array of arrays, and then bulk sending that to Pandas. It's faster than appending rows to a dataframe one at a time. – Pluckerpluck May 13 '19 at 15:13

0 Answers0