3

I want to unpack a dataframe that contains variable amount of 'productIDs' nested in dictionaries in each column. Example table:

awardedProducts
0   []
1   [{'productID': 14306}]
2   []
3   []
4   []
5   []
6   []
7   [{'productID': 60974}, {'productID': 72961}]
8   [{'productID': 78818}, {'productID': 86765}]
9   [{'productID': 155707}]
10  [{'productID': 54405}, {'productID': 69562}, {...

I've tried iterating over the df with

df = []
for row, index in activeTitles.iterrows():
    df.append(index[0])

I want to end up with a single column dataframe, or series with the productIDs all listed. EG:

productID
0  14306
1  60974
2  72961
3  78818
4  86765
5  155707
6  54405
7  69562
lmonty
  • 187
  • 8
  • How did you get your dataframe at the first place, if I may ask? – Quang Hoang Jul 19 '19 at 17:14
  • It's from my ERP API. I query for active Promotions and the products that are contained within the promotions are nested. I need to get them into a simpler format so that I can compare against a list of new promotion items to check if any of them are already listed. – lmonty Jul 19 '19 at 17:21
  • Can you show a sample of your desired output given the provided input? – G. Anderson Jul 19 '19 at 17:21
  • FYI, the reason I asked is that the data does not look ideal for a dataframe. But if it comes from an API, there might not be much you can do about it. Another question, is `[{'productID': 14306}]` a list of dictionaries? – Quang Hoang Jul 19 '19 at 17:23

2 Answers2

1

Since there is no flatmap operation in Pandas, you may do something like this:

import pandas as pd

data = pd.Series([[], [{'productID': 14306}], [], [], [], [], [],
                  [{'productID': 60974}, {'productID': 72961}],
                  [{'productID': 78818}, {'productID': 86765}],
                  [{'productID': 155707}], [{'productID': 54405}, {'productID': 69562}]])
products = (data.apply(pd.Series).unstack().dropna()
            .apply(lambda p: p['productID']).reset_index(drop=True))
print(products)
# 0     14306
# 1     60974
# 2     72961
# 3     78818
# 4     86765
# 5    155707
# 6     54405
# 7     69562
# dtype: int64
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – BENY Jul 19 '19 at 18:24
1

Glad to share the new version pandas ' explode on 0.25.0

s=data.explode().str.get('productID').dropna()
s
Out[91]: 
1      14306.0
7      60974.0
7      72961.0
8      78818.0
8      86765.0
9     155707.0
10     54405.0
10     69562.0
dtype: float64

Share the function for those people do not want to update pandas

unnesting(data.to_frame('pid'),['pid'],1)['pid'].str.get('productID').dropna()
Out[18]: 
1      14306
7      60974
7      72961
8      78818
8      86765
9     155707
10     54405
10     69562
Name: pid, dtype: int64

def unnesting(df, explode, axis):
    if axis==1:
        idx = df.index.repeat(df[explode[0]].str.len())
        df1 = pd.concat([
            pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
        df1.index = idx
        return df1.join(df.drop(explode, 1), how='left')
    else :
        df1 = pd.concat([
                         pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234
  • this is great but I need to update my pandas first. i will implement this at a later date. – lmonty Jul 19 '19 at 18:17
  • @lmonty personally I do not like the apply pd. Serise method , if you would like not update pandas , check https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939, and I will update – BENY Jul 19 '19 at 18:18