Dynamically append dict into empty Pandas.Dataframe

Question

I am parsing line by line through a massive text file (~10M lines) by regex to filter and clean up what need.

Each matched.groupdict() returns {'col1:'...','col2:'...','col3:'...'} which I would like to collect into a DataFrame. Just like a database, each entry would had its own index.

Over the past few days, I did tons of research on SO, Pandas.DataFrame docs, Coursera on DataFrames and nothing worked. Most solutions suggest creating a list of my groupdict() and then create a DataFrame, but that takes too much memory and I need it to be more dynamic.

What should I do?

pattern = re.compile("(?P<col1>...)(?P<col2>...)(?P<col3>...)")
data = pd.DataFrame()
with open("massive.txt", 'r') as massive:
    for line in massive:
        matched = pattern.search(line)
        if(matched):
            data.append(matched.groupdict(), ignore_index=True)

data
Empty DataFrame
Columns: []
Index: []

`append` is not an inplace operation for DataFrames, so you need to reassign, i.e. `data = data.append(...)`. — root, Mar 30 '17 at 17:58
So, did you look at the [documentation for `DataFrame.append`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html) ? Because it quite clearly states "Append rows of other to the end of this frame, **returning a new object**." As a rule of thumb, though, you can pretty much assume no `pandas` methods act (by default) in-place. — juanpa.arrivillaga, Mar 30 '17 at 17:59
oh awkward. :D silly of me, I totally forgot to reassign. Thanks root and juanpa-arrivillaga :D — Ken, Mar 30 '17 at 18:04

score 3 · Accepted Answer · answered Mar 30 '17 at 18:05

3

... silly me

...
data = data.append(matched.groupdict(), ignore_index=True)

answered Mar 30 '17 at 18:05

Ken

641
3
11
25

Dynamically append dict into empty Pandas.Dataframe

1 Answers1