Create a Dataframe from a Series and a String

Question

Suppose I have a dataframe with columns with Strings, Series and Integers that I would like to combine into a new dataframe with the String and the Integer combined with every entry in the Series. How could I go about it?

Given this example:

data = {'fruits': ['banana', 'apple', 'pear'], 
    'source' : (['brazil', 'algeria', 'nigera'], ['brazil', 'morocco', 'iran', 'france'], ['china', 'india', 'mexico']),
    'prices' : [2, 3, 7]}
df = pd.DataFrame(data, columns = ['fruits', 'source', 'prices'])

I would like to get a 3x10 dataframe with;

['banana', 'banana', 'banana', 'apple', 'apple', 'apple', 'apple', 'pear', 'pear', 'pear'],
['brazil', 'algeria', 'nigera', 'brazil', 'morocco', 'iran', 'france', 'china', 'india', 'mexico'],
['2', '2', '2', '3', '3', '3', '3', '7', '7', '7'],

I guess it shouldn't be too complex but I can't find a neat solutions.

score 7 · Accepted Answer · answered Dec 13 '17 at 17:06

7

Use explode() function:

In [30]: explode(df, lst_cols='source')
Out[30]:
   fruits   source  prices
0  banana   brazil       2
1  banana  algeria       2
2  banana   nigera       2
3   apple   brazil       3
4   apple  morocco       3
5   apple     iran       3
6   apple   france       3
7    pear    china       7
8    pear    india       7
9    pear   mexico       7

answered Dec 13 '17 at 17:06

MaxU - stand with Ukraine

205,989
36
386
419

This is nice : -) just like `unnest` in R , hope `pandas` can consider add it to their module – BENY Dec 13 '17 at 17:08
It's so nice to be able to look back on your old answers and reuse code. – cs95 Dec 13 '17 at 17:08
Thank you guys! :-) – MaxU - stand with Ukraine Dec 13 '17 at 17:09
5

These pandas func names sounds like some pyscho movie actions - explode, melt, cut ;) – Divakar Dec 13 '17 at 17:20
3

@Divakar *chuckle* Just need a few more like - tie, handcuff, squeeze... wait that one already exists in numpy. ;) – Scott Boston Dec 13 '17 at 17:28

BENY · Answer 2 · 2017-12-13T17:11:40.987

By using stack and apply(pd.Series)

df.set_index(['fruits','prices']).source.apply(pd.Series).\
      stack().reset_index(level=['fruits','prices']).\
             rename(columns={0:'source'})
Out[64]: 
   fruits  prices   source
0  banana       2   brazil
1  banana       2  algeria
2  banana       2   nigera
0   apple       3   brazil
1   apple       3  morocco
2   apple       3     iran
3   apple       3   france
0    pear       7    china
1    pear       7    india
2    pear       7   mexico

Op2 recreate your df

df1=df[['fruits','prices']].reindex(df.index.repeat(df.source.apply(len)))
df1['source']=np.concatenate(df.source.values)
df1
Out[69]: 
   fruits  prices   source
0  banana       2   brazil
0  banana       2  algeria
0  banana       2   nigera
1   apple       3   brazil
1   apple       3  morocco
1   apple       3     iran
1   apple       3   france
2    pear       7    china
2    pear       7    india
2    pear       7   mexico

cs95 · Answer 3 · 2019-01-13T06:57:05.207

My shot at this using concat + melt.

c = ['fruits', 'prices']
df =  (pd.concat([pd.DataFrame(df.source.tolist()), df[c]], 1)
         .melt(c, value_name='source')
         .drop('variable', 1)
         .dropna())

df

    fruits  prices   source
0   banana       2   brazil
1    apple       3   brazil
2     pear       7    china
3   banana       2  algeria
4    apple       3  morocco
5     pear       7    india
6   banana       2   nigera
7    apple       3     iran
8     pear       7   mexico
10   apple       3   france

Create a Dataframe from a Series and a String

3 Answers3