1

I have a pandas DataFrame with numpy arrays as values in a column. I would like to turn each element to a row with the same date:

My DataFrame looks like this:

    date    website+
0       2014-11-26  [A]
238     2015-12-20  [B, C]
297     2016-02-17  [D]
303     2016-02-23  [E, F, G]

And I want:

       date     website+
    0       2014-11-26  [A]
    238     2015-12-20  [B]
            2015-12-20  [C]
    297     2016-02-17  [D]
    303     2016-02-23  [E]
            2016-02-23  [F]
            2016-02-23  [G]

The index is not important as long as the date stays the same. I have found a solution to turn each entry into a column, but thats not exactly what I want.

jango
  • 13
  • 1
  • 3

2 Answers2

1

If your first column is already in index, then you can use the following:

df.set_index('date', append=True)['website+']\
  .apply(pd.Series).stack().reset_index(level=-1, drop=True)\
  .to_frame(name='website+')

Output:

               website+
    date               
0   2014-11-26        A
238 2015-12-20        B
    2015-12-20        C
297 2016-02-17        D
303 2016-02-23        E
    2016-02-23        F
    2016-02-23        G
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Worked really well thank you. Is the answer expandable if I have more than one column made up the same as website+? – jango Jan 25 '18 at 19:33
0

Another solution

import pandas as pd

df = pd.DataFrame({u'date': ['2014-11-26', '2015-12-20', '2016-02-17','2016-02-23'],
 u'website+': [['A'], ['B','C'], ['D'],['E','F','G'] ]})

print (df)

def expand(row):
    ws = row['website+'] if isinstance(row['website+'], list) else [row['website+']]
    s = pd.Series(row['date'], index=list(set(ws)))
    return s

df1 = df.apply(expand, axis=1).stack()   

print (df1)

Output:

         date   website+
0  2014-11-26        [A]
1  2015-12-20     [B, C]
2  2016-02-17        [D]
3  2016-02-23  [E, F, G]
0  A    2014-11-26
1  B    2015-12-20
   C    2015-12-20
2  D    2016-02-17
3  E    2016-02-23
   F    2016-02-23
   G    2016-02-23
dtype: object
Richard Rublev
  • 7,718
  • 16
  • 77
  • 121
  • That works fine. Is it possible to extent the answer to two columns with the structure of website+? – jango Jan 25 '18 at 19:54