3

I have a CSV file with 100K+ lines of data in this format:

"{'foo':'bar' , 'foo1':'bar1', 'foo3':'bar3'}"


"{'foo':'bar' , 'foo1':'bar1', 'foo4':'bar4'}"

The quotes are there before the curly braces because my data came in a CSV file.

I want to extract the key value pairs in all the lines to create a dataframe like so:

Column Headers: foo, foo1, foo3, foo...


Rows:           bar, bar1, bar3, bar...

I've tried implementing something similar to what's explained here ( Python: error parsing strings from text file with Ast module).

I've gotten the ast.literal_eval function to work on my file to convert the contents into a dict but now how do I get the DataFrame function to work? I am very much a beginner so any help would be appreciated.

import pandas as pd
import ast

with open('file_name.csv') as f:
        for string in f:
            parsed = ast.literal_eval(string.rstrip())
            print(parsed)


pd.DataFrame(???)
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Possible duplicate of [Convert a String representation of a Dictionary to a dictionary?](https://stackoverflow.com/questions/988228/convert-a-string-representation-of-a-dictionary-to-a-dictionary) – eva-vw Jul 26 '19 at 19:20
  • Follow this append the dict from every row in the dataframe. https://stackoverflow.com/a/43957800/6863323 – Mohit Rajpoot Jul 26 '19 at 19:27

2 Answers2

0

You can turn a dictionary into a pandas dataframe using pd.DataFrame.from_dict, but it will expect each value in the dictionary to be in a list.

for key, value in parsed.items():
   parsed[key] = [value]

df = pd.DataFrame.from_dict(parsed)

You can do this iteratively by appending to your dataframe.

df = pd.DataFrame()
for string in f:
    parsed = ast.literal_eval(string.rstrip())
    for key, value in parsed.items():
        parsed[key] = [value]
    df.append(pd.DataFrame.from_dict(parsed))
eva-vw
  • 650
  • 4
  • 11
0

parsed is a dictionary, you make a dataframe from it, then join all the frames together:

df = []
with open('file_name.csv') as f:
    for string in f:
        parsed = ast.literal_eval(string.rstrip())
        if type(parsed) != dict:
            continue

        subDF = pd.DataFrame(parsed, index=[0])
        df.append(subDF)

df = pd.concat(df, ignore_index=True, sort=False)

Calling pd.concat on a list of dataframes is faster than calling DataFrame.append repeatedly. sort=False means that pd.concat will not sort the column names when it encounters a few one, like foo4 on the second row.

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • This makes sense! I tried this out and got a ValueError: DataFrame constructor not properly called! error. Perhaps because parsed is still being treated as a string? When I tried checking type(parsed) str was returned. – trynagetajob Jul 26 '19 at 21:20
  • That appears to be the bug. May be ignore the line if `parsed` cannot be converted into a dict? – Code Different Jul 26 '19 at 21:23
  • This is something I've noticed. I can individually add a line from my data set and it gets converted into a dict when I run ast.literal_eval on it. But when I try ast.literal_eval on the file as a whole, and check type() afterwards, str is returned. – trynagetajob Jul 26 '19 at 21:30
  • Which line should I ignore? – trynagetajob Jul 26 '19 at 21:34