4

I have a pandas dataframe like this:

    title   author              year    type  
0   t1      a1                  1980    article 
1   t2      ['a2', 'a3', 'a4']  1983    article 
2   t3      a5                  1982    article 
3   t4      a6                  1977    article 
4   t5      ['a7','a8']         2011    book 

This is a short example, the original is more big.

And I need a dataframe like this:

    title   author   year   type  
0   t1      a1       1980   article
1   t2      a2       1983   article
2   t2      a3       1983   article 
3   t2      a4       1983   article 
4   t3      a5       1982   article 
5   t4      a6       1977   article 
6   t5      a7       2011   book
7   t5      a8       2011   book 

Note that lists have different number of elements

IvanMarkus
  • 43
  • 5
  • Possible duplicate of http://stackoverflow.com/questions/27263805/pandas-when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list – bigbounty May 09 '17 at 00:54

1 Answers1

2
#Expand the list of authors to separate rows and build a authors df
df_author = df.author.apply(pd.Series).stack().rename('author').reset_index()

#join the authors df to the original df
pd.merge(df_author,df,left_on='level_0',right_index=True, suffixes=(['','_old']))[df.columns]

Out[184]: 
  title author  year     type
0    t1     a1  1980  article
1    t2     a2  1983  article
2    t2     a3  1983  article
3    t2     a4  1983  article
4    t3     a5  1982  article
5    t4     a6  1977  article
6    t5     a7  2011  article
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • Not works fine. The result is the same that first DF (with lists) – IvanMarkus May 09 '17 at 12:59
  • I think that list elements in author column are not been interpreted like lists when I create the dataframe. The dataframe is created with df=pd.read_csv('./file.csv', names=['title', 'author', 'year', 'type'], header=0, sep=';', low_memory=False) from csv. Because this your solution do not works. What can I do? – IvanMarkus May 09 '17 at 16:04