0

I have large text files containing lists of nested dictionaries of the below sample format:

[ {key1:value1, key2:value2, key3: { key3_key1:key3_value1,
        key3_key2:key3_value2
      } ]

Trying to convert each list as pandas data frame row of the following column format

columns = [key1, key2, key3_key1, key3_key2]

What is the fastest way to achieve a complete data frame?

jps
  • 20,041
  • 15
  • 75
  • 79
Srinath
  • 19
  • 4
  • 1
    Is it a valid json file? – micric May 13 '19 at 09:46
  • To do it you need either ```.apply``` or ```lambda```, here is a post that may help you https://stackoverflow.com/questions/51388201/fastest-way-to-create-a-pandas-column-conditionally – Mark May 13 '19 at 09:50
  • @micric, its not a valid json file. Its a text file containing lists of nested dictionaries. json.loads doesnt work on it. – Srinath May 13 '19 at 12:56
  • Thanks @Mark, I can read the text file line after line and append rows to a dataframe. Isn't that too slow? Should I be taking chunks to different cores and then apply the row append function? – Srinath May 13 '19 at 12:59
  • I would suggest reading it into a python array of arrays, and then bulk sending that to Pandas. It's faster than appending rows to a dataframe one at a time. – Pluckerpluck May 13 '19 at 15:13

0 Answers0