1

I am a beginner and would appreciate some help :) I want to reduce the amount of rows by combining rows based on a variable.

I have given data set:

d = {
   'day': ['one', 'one', 'two', 'three', 'three'], 
   'switch': [0, 1, 0, 1, 0 ],
   'value': ['green', 'red', 'blue', 'orange', 'purple']
}
df = pd.DataFrame(data=d)
print(df)

    day     switch  value
0   one     0       green
1   one     1       red
2   two     0       blue
3   three   1       orange
4   three   0       purple

I try to restructure the data set to look like this:

    day      switch_0_value    switch_1_value
0   one      green             red
1   two      blue              NAN
2   three    purple            orange

I have looked at 'combine_first' but don't know how to apply that within a loop.

Thank you so much in advance!

rob mayoff
  • 375,296
  • 67
  • 796
  • 848
Dana_Miles
  • 399
  • 3
  • 17
  • check this - https://stackoverflow.com/questions/61922051/pandas-dataframe-convert-time-series-with-multiple-elements-to-a-flattened-da/61928523#61928523 – Hietsh Kumar May 24 '20 at 11:17

3 Answers3

2

Try something like this:

result = df[df["switch"] == 0][["day", "value"]].rename(columns = {"value":"switch_0_value"})
result = pd.merge(result, df[df["switch"] == 1][["day", "value"]], how='left').rename(columns = {"value":"switch_1_value"})
result
    day    switch_0_value   switch_1_value
0   one    green            red
1   two    blue             NaN
2   three  purple           orange
Let's try
  • 1,044
  • 9
  • 20
  • Hey Let's try, this was a great help! Thank you for helping me out and introducing me to a new approach. It worked :) – Dana_Miles May 24 '20 at 12:15
0

You can use df.unstack here.

df.set_index(['day','switch']).value.unstack(-1)

switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN

Or df.pivot

df.pivot(index='day',columns='switch',values='value')

switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN

or using df.groupby with agg and pd.NamedAgg is just for adding custom named columns

df.groupby('day').agg(switch_0=pd.NamedAgg(column='value',aggfunc='first'),
                      switch_1=pd.NamedAgg(column='value',aggfunc=lambda x:x.iloc[-1] 
                      if x.size>1 else np.nan))

      switch_0 switch_1
day
one      green      red
three   orange   purple
two       blue      NaN

or using df.pivot_table but pivot_table's default aggfunc is 'mean' so we need to get little dirty here. ;)

df.pivot_table(index='day',columns='switch',values='value',aggfunc=lambda x:x)
switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • Thank you Ch3steR for your help. I struggled a bit with the double index. Still, getting to know the unstack() function was interesting. Much appreciated you sacrificing some of your time to help me out! :) – Dana_Miles May 24 '20 at 12:12
  • @DanaMiles Glad to this answer helped. I left links to the relevant documentation pages, they have some examples to better understand what does the function do, else there's always an answer in SO regarding pretty much everything. ;) – Ch3steR May 24 '20 at 12:19
  • Thank you! SO is a great place :) – Dana_Miles May 24 '20 at 12:47
0

To have the proper ordering of days, start from conversion of day column from plain string to Categorical:

days = ['one', 'two', 'three']
dayCat = pd.Categorical(days, categories=days, ordered=True)
df.day = df.day.astype(dayCat)

Then compute the result:

result = df.groupby('day').apply(lambda grp: grp.sort_values('switch')
    .value.reset_index(drop=True)).unstack().add_prefix('switch_')\
    .add_suffix('_value').reset_index()

The result is:

     day switch_0_value switch_1_value
0    one          green            red
1    two           blue            NaN
2  three         purple         orange
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41