0

I have a table that looks like this:

> data = {'index':[0,1,2,3],'column_names':['foo_1','foo_2','bar_1','bar_2'], 'Totals':[1050,400,450,300]}

and I want to do three things:

  1. Pivot each row in the 'column name' column to an actual column name.
  2. Create an additional column whose values are the difference of the values of ('foo_1' and 'foo_2') and ('bar_1' and 'bar_2')
  3. This also needs to be a dataframe object that looks like this: data_t = {'foo_1':[1050],'foo_2':[400],'Foo % Diff':[650],'bar_1':[450],'bar_2':[300],'Bar % Diff':[150]}

Really would appreciate how to do this and explanations.

  • What have you tried? Also please take a look at how to [provide reproducible sample data](https://stackoverflow.com/questions/20109391) on this site. – Bill Huang Oct 19 '20 at 21:24
  • @StargazingFish please accept one of the answers as solution by clicking the checkmark next to the answer. Thank you! – David Erickson Dec 10 '20 at 05:59

1 Answers1

0
  1. You need to .groupby to get the difference with diff().abs() in s1
  2. Then, you need to .groupby to get the name of the total columns in s2 and concat s1 and s2 together.
  3. From there, append the results of s1 and s2 to the dataframe and use .T to transpose the dataframe after setting the index to column_names, so values in column_names show up as the column headers instead of being in the first row, which is what would happen without .set_index():

data = {'column_names':['foo_1','foo_2','bar_1','bar_2'], 'Totals':[1050,400,450,300]}
df = pd.DataFrame(data)
s = df['column_names'].str.split('_').str[0]
s1 = df.groupby(s)['Totals'].diff().abs()
s2 = df.groupby(s)['column_names'].apply(lambda x: x.str.split('_').str[0] + '__% Diff')
df = (df.append(pd.concat([s1,s2], axis=1).dropna())
      .sort_values('column_names').set_index('column_names').T)
df.columns = df.columns.str.replace('__', ' ') # the column name was chosen carefully to preserve order and then you can simply replace __ with a space
df
Out[1]: 
column_names  bar_1  bar_2  bar % Diff   foo_1  foo_2  foo % Diff
Totals        450.0  300.0       150.0  1050.0  400.0       650.0 

Or, if you are looking for the percentage change with pct_change():

data = {'column_names':['foo_1','foo_2','bar_1','bar_2'], 'Totals':[1050,400,450,300]}
df = pd.DataFrame(data)
s = df['column_names'].str.split('_').str[0]
s1 = df.groupby(s)['Totals'].apply(lambda x: x.pct_change())
s2 = df.groupby(s)['column_names'].apply(lambda x: x.str.split('_').str[0] + '__% Diff')
df = (df.append(pd.concat([s1,s2], axis=1).dropna())
      .sort_values('column_names').set_index('column_names').T)
df.columns = df.columns.str.replace('__', ' ') # the column name was chosen carefully to preserve order and then you can simply replace __ with a space
df
Out[2]: 
column_names  bar_1  bar_2  bar % Diff   foo_1  foo_2  foo % Diff
Totals        450.0  300.0   -0.333333  1050.0  400.0   -0.619048
David Erickson
  • 16,433
  • 2
  • 19
  • 35