2

Ok so I am a relative noob to Python. I have a need for a transformation of the following dataframe

bd, date

[[None]], 2017-11-01 09:00:00

[[Sulphur], [Green Tea]], 2017-11-02 09:00:00 

[[Green Tea], [Jasmine]], 2017-11-03 09:00:00 

.....

to be transformed to

date, None, Sulphur, Green Tea, Jasmine...

2017-11-01 09:00:00, 1, 0, 0, 0...

2017-11-02 09:00:00, 0, 1, 1, 0...

2017-11-03 09:00:00, 0, 0, 1, 1...

The items in the embedded list in column BD are dynamic and cannot be predefined columns in the new dataFrame.

I tried the following through another helpful post Create new columns in pandas from python nested lists but could not adapt it successfully

suppDF1 = suppDF.bd.apply(lambda x: pd.Series(1, x)).fillna(0).astype(int)

Using the code above I only see 5 columns with incorrect 1s so I am clearly out of my depth.

Update

I tried Max's suggestion but I guess I may have something erroneous in my attempt at using pivot:

suppDF1 = suppDF.pivot(index="date", columns="bd")["bd"]

I get the following error

unhashable type: 'list'
Ash S
  • 23
  • 1
  • 4
  • Have you tried a [pivot](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html)? – Max von Hippel Dec 08 '17 at 22:06
  • 1
    Thanks for the suggestion Max - I tried it on my df but it gives me this error: unhashable type: 'list' I used the following code: suppDF1 = suppDF.pivot(index="date", columns="bd")["bd"] – Ash S Dec 08 '17 at 22:20
  • Wait, does `suppDF1 = suppDF.pivot(index="date", columns="bd")` throw an error, or does the error happen when you add in the indexing `["bd"]` at the end? – Max von Hippel Dec 08 '17 at 22:41
  • 1
    Either way I get the same error - even if I remove ["bd"] at the end, the error doesn't change. It does make me wonder, my data may not have unique dates. Is this a requirement for the pivot. If so, should I try adding an index column if uniqueness is a requirement – Ash S Dec 08 '17 at 22:59
  • 1
    Sorry I just wanted to point out, the contents of the BD column is a list, which may mean I need to somehow unpack this prior to trying the pivot. – Ash S Dec 08 '17 at 23:01
  • I don't think uniqueness is a requirement for the index to pivot on because I think it'll just concatenate different records matching the same index into a single line for that index value. So all rows with the same date join into one row. You might end up with a multi-index but I really doubt it. I think the problem is something else. – Max von Hippel Dec 08 '17 at 23:03

1 Answers1

0

I am sure there are more elegant, functional, pythonic ways to do this... and I would love to know what they are.

import numpy as np
import pandas as pd

# define dataframe
df = pd.DataFrame(columns = ['bd', 'date'])
df.loc[0, 'bd'] = [[None]]
df.loc[0, 'date'] = '2017-11-01 09:00:00'
df.loc[1, 'bd'] = [['Sulphur'], ['Green Tea']]
df.loc[1, 'date'] = '2017-11-02 09:00:00'
df.loc[2, 'bd'] = [['Green Tea'], ['Jasmine']]
df.loc[2, 'date'] = '2017-11-03 09:00:00'
print(df)

# set the index
df.set_index('date', inplace = True)

# df['bd'] contains doubly nested lists
# for item in column, for list in item, for string in list, add string to list
cols = []
for ls2 in df['bd']:
    for ls1 in ls2:
        for string in ls1:
            if string not in cols:
                cols.append(string)

# make a column for every string in df['bd']
for tea in cols:
    df[tea] = 0

# manual one-hot encoding; couldn't get pd.get_dummies() to work
for row in df.iterrows():
    for ls in row[1][0]:
        for el in ls:
            if el in df.columns:
                df.loc[row[0], el] = 1
df.drop('bd', axis = 1, inplace = True)
df.fillna(0)

I spent some time on this; here's some stuff that didn't totally work:

I wasn't able to get this recursive function to work for me (blame me, not the function)... Flatten (an irregular) list of lists

I tried get_dummies, but it can't hash a list, let alone a doubly nested list... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html

I tried pivoting and pivot_table... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html

I tried converting the list into a string, but that ended up being a dead end as well... Converting a Panda DF List into a string

Evan
  • 2,121
  • 14
  • 27
  • Ty Evan. I would imagine there is likely a more elegant way to do this.. but I really appreciate your answer. It works.. and since this is likely one of many cogs I would need.. I am happy to be progressing on my project with this approach. – Ash S Dec 09 '17 at 01:37