Pandas - unflatten data frame with columns containing array

Question

I have a data frame which has been flattened on a specific property:

id      property_a    properties_b
id_1    property_a_1  [property_b_11, property_b_12]
id_2    property_a_2  [property_b_21, property_b_22, property_b_23]

..................

I'd like to expand the column properties_b to go back to a data frame looking like this:

id      property_a    property_b
id_1    property_a_1  property_b_11
id_1    property_a_1  property_b_12
id_2    property_a_2  property_b_21
id_2    property_a_2  property_b_22
id_2    property_a_2  property_b_23

..................

I suspect this is very simple with Pandas, but being new to Python, I struggle to find an elegant way to do so.

score 3 · Accepted Answer · edited May 23 '17 at 12:09

This question was addressed here and here. If you find these questions and answers useful, feel free to up vote them as well.

Setup

df = pd.DataFrame([
        ['id_1', 'property_a_1', ['property_b_11', 'property_b_12']],
        ['id_2', 'property_a_2', ['property_b_21', 'property_b_22', 'property_b_23']],
    ], columns=['id', 'property_a', 'properties_b'])

df

rows = []
for i, row in df.iterrows():
    for a in row.properties_b:
        row.properties_b = a
        rows.append(row)

pd.DataFrame(rows, columns=df.columns)

Handy functions

def loc_expand(df, loc):
    rows = []
    for i, row in df.iterrows():
        vs = row.at[loc]
        new = row.copy()
        for v in vs:
            new.at[loc] = v
            rows.append(new)

    return pd.DataFrame(rows)

def iloc_expand(df, iloc):
    rows = []
    for i, row in df.iterrows():
        vs = row.iat[iloc]
        new = row.copy()
        for v in vs:
            row.iat[iloc] = v
            rows.append(row)

    return pd.DataFrame(rows)

These should both return the same result as above.

loc_expand(df, 'properties_b')
iloc_expand(df, 2)

Thanks for the solution :). A small change needed for it to work completely though: in the first loop you are editing the row without cloning it first. In the handy functions the copy() calls should be done in the inner loops ` def loc_expand(df, loc): rows = [] for i, row in df.iterrows(): vs = row.at[loc] for v in vs: new = row.copy() new.at[loc] = v rows.append(new) return pd.DataFrame(rows) ` — raphael.glon, Jul 23 '21 at 15:02

score 3 · Answer 2 · answered Jul 25 '16 at 22:08

Here is another approach using to_records, some tuple-mapping and from_records.

import pandas as pd
import itertools

def expand_column(df, col_id):
    records = map(lambda r: [r[1:col_id] + (l,) + r[col_id + 1:] for l in r[col_id]], map(tuple, df.to_records()))
    return pd.DataFrame.from_records(itertools.chain.from_iterable(records), columns=df.columns)

df = pd.DataFrame([['a', [1,2,3], 'a'],['b', [4,5], 'b']], columns=['C1', 'L', 'C2'])

print(df)
print(expand_column(df, 2))

#   C1          L C2
# 0  a  [1, 2, 3]  a
# 1  b     [4, 5]  b
#
#   C1  L C2
# 0  a  1  a
# 1  a  2  a
# 2  a  3  a
# 3  b  4  b
# 4  b  5  b

Pandas - unflatten data frame with columns containing array

2 Answers2

Setup

Handy functions

Linked