1

I am attempting to add at least one, or even multiple columns to a dataframe from a mapped dictionary. I have a dictionary keyed on product catalog numbers containing a list of standardized hierarchical nomenclature for that product number. Example below.

dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame( {"product": [1, 2, 3]})
df['catagory'] = df['product'].map(dict)
print(df)

I get the following result:

    product      catagory
0        1  [a, b, c, d]
1        2  [w, x, y, z]
2        3           NaN

I would like to obtain the following:

     product     cat1     cat2     cat3     cat4
0       1          a        b       c         d
1       2          w        x       y         z
2       3         NaN      NaN     NaN       NaN

Or even better:

     product     category
0       1           d
1       2           z
2       3         NaN  

I have been trying just to parse our one of the items from the list within the dictionary and append it to the dataframe but have only found advice for mapping dictionaries that contain one item within the list, per this EXAMPLE.

Any help appreciated.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Rudabagle
  • 39
  • 7
  • This may be of help: https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows/32470490#32470490 – Alexander Jul 14 '17 at 22:43

3 Answers3

0

Let's use set_index, apply, add_prefix , reset_index:

df_out = (df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x)))

df_out.columns = df_out.columns + 1

df_out.add_prefix('cat').reset_index()

Output:

   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

To it to the next even better setp:

(df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x))
  .stack(dropna=False)
  .rename('category')
  .reset_index()
  .drop('level_1',axis=1)
  .drop_duplicates()
)

Output:

   product category
0        1        a
1        1        b
2        1        c
3        1        d
4        2        w
5        2        x
6        2        y
7        2        z
8        3      NaN
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

Notice:

Never use reserved words like list, type, dict... as variables because masking built-in functions.

So if use:

#dict is variable name
dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
#create dictionary is not possible, because dict is dictionary
print (dict(a=1, b=2))
{'a': 1, 'b': 2}

get error:

TypeError: 'dict' object is not callable

and debug is very complicated. (After testing restart IDE)

So use another variable like d or categories:

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
print (dict(a=1, b=2))
{'a': 1, 'b': 2}

I think you need DataFrame.from_dict with join:

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame( {"product": [1, 2, 3]})
print (df)
   product
0        1
1        2
2        3

df1 = pd.DataFrame.from_dict(d, orient='index')
df1.columns = ['cat' + (str(i+1)) for i in df1.columns]
print(df1)
  cat1 cat2 cat3 cat4
1    a    b    c    d
2    w    x    y    z

df2 = df.join(df1, on='product')
print (df2)
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

Then is possible use melt or stack:

df3 = df2.melt('product', value_name='category').drop('variable', axis=1)
print (df3)
    product category
0         1        a
1         2        w
2         3      NaN
3         1        b
4         2        x
5         3      NaN
6         1        c
7         2        y
8         3      NaN
9         1        d
10        2        z
11        3      NaN

df2 = df.set_index('product').join(df1)
        .stack(dropna=False)
        .reset_index(level=1, drop=True)
        .rename('category')
        .reset_index()
print (df2)
    product category
0         1        a
1         1        b
2         1        c
3         1        d
4         2        w
5         2        x
6         2        y
7         2        z
8         3      NaN
9         3      NaN
10        3      NaN
11        3      NaN

If column category is in df solution is similar, only is necessary remove rows with NaN by DataFrame.dropna:

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame( {"product": [1, 2, 3]})
df['category'] = df['product'].map(d)
print(df)

df1 = df.dropna(subset=['category'])
df1 = pd.DataFrame(df1['category'].values.tolist(), index=df1['product'])
df1.columns = ['cat' + (str(i+1)) for i in df1.columns]
print(df1)
        cat1 cat2 cat3 cat4
product                    
1          a    b    c    d
2          w    x    y    z

df2 = df[['product']].join(df1, on='product')
print (df2)
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}

#Split product to 4 columns
df[['product']].join(
    df.apply(lambda x: pd.Series(d.get(x['product'],[np.nan])),axis=1)
      .rename_axis(lambda x: 'cat{}'.format(x+1), axis=1)
    )
Out[187]: 
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

#only take the last element
df['catagory'] = df.apply(lambda x: d.get(x['product'],[np.nan])[-1],axis=1)

df
Out[171]: 
   product catagory
0        1        d
1        2        z
2        3      NaN
Allen Qin
  • 19,507
  • 8
  • 51
  • 67