3

I've a dataframe which contains a list of tuples in one of its columns. I need to split the list tuples into corresponding columns. My dataframe df looks like as given below:-

          A                                        B
[('Apple',50),('Orange',30),('banana',10)]        Winter   
[('Orange',69),('WaterMelon',50)]                 Summer 

The expected output should be:

    Fruit         rate             B
  Apple           50              winter   
  Orange          30              winter   
  banana          10              winter   
  Orange          69              summer   
  WaterMelon      50              summer 
vestland
  • 55,229
  • 37
  • 187
  • 305
Nayana Madhu
  • 1,185
  • 5
  • 17
  • 34

3 Answers3

1

This should work:

fruits = []
rates = []
seasons = []

def create_lists(row):
    tuples = row['A']
    season = row['B']
    for t in tuples:
        fruits.append(t[0])
        rates.append(t[1])
        seasons.append(season)

df.apply(create_lists, axis=1)

new_df = pd.DataFrame({"Fruit" :fruits, "Rate": rates, "B": seasons})[["Fruit", "Rate", "B"]]

output:

        Fruit  Rate       B
0       Apple    50  winter
1      Orange    30  winter
2      banana    10  winter
3      Orange    69  summer
4  WaterMelon    50  summer
AndreyF
  • 1,798
  • 1
  • 14
  • 25
1

You can use DataFrame constructor with numpy.repeat and numpy.concatenate:

df1 = pd.DataFrame(np.concatenate(df.A), columns=['Fruit','rate']).reset_index(drop=True)
df1['B'] = np.repeat(df.B.values, df['A'].str.len())
print (df1)
        Fruit rate       B
0       Apple   50  Winter
1      Orange   30  Winter
2      banana   10  Winter
3      Orange   69  Summer
4  WaterMelon   50  Summer

Another solution with chain.from_iterable:

from  itertools import chain

df1 = pd.DataFrame(list(chain.from_iterable(df.A)), columns=['Fruit','rate'])
        .reset_index(drop=True)
df1['B'] = np.repeat(df.B.values, df['A'].str.len())
print (df1)
        Fruit  rate       B
0       Apple    50  Winter
1      Orange    30  Winter
2      banana    10  Winter
3      Orange    69  Summer
4  WaterMelon    50  Summer
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can do this in a chained operation:

(
    df.apply(lambda x: [[k,v,x.B] for k,v in x.A],axis=1)
      .apply(pd.Series)
      .stack()
      .apply(pd.Series)
      .reset_index(drop=True)
      .rename(columns={0:'Fruit',1:'rate',2:'B'})
)
Out[1036]: 
        Fruit  rate       B
0       Apple    50  Winter
1      Orange    30  Winter
2      banana    10  Winter
3      Orange    69  Summer
4  WaterMelon    50  Summer
Allen Qin
  • 19,507
  • 8
  • 51
  • 67