- You need to
.groupby
to get the difference with diff().abs()
in s1
- Then, you need to
.groupby
to get the name of the total columns in s2
and concat
s1 and s2 together.
- From there,
append
the results of s1
and s2
to the dataframe and use .T
to transpose the dataframe after setting the index to column_names
, so values in column_names
show up as the column headers instead of being in the first row, which is what would happen without .set_index()
:
data = {'column_names':['foo_1','foo_2','bar_1','bar_2'], 'Totals':[1050,400,450,300]}
df = pd.DataFrame(data)
s = df['column_names'].str.split('_').str[0]
s1 = df.groupby(s)['Totals'].diff().abs()
s2 = df.groupby(s)['column_names'].apply(lambda x: x.str.split('_').str[0] + '__% Diff')
df = (df.append(pd.concat([s1,s2], axis=1).dropna())
.sort_values('column_names').set_index('column_names').T)
df.columns = df.columns.str.replace('__', ' ') # the column name was chosen carefully to preserve order and then you can simply replace __ with a space
df
Out[1]:
column_names bar_1 bar_2 bar % Diff foo_1 foo_2 foo % Diff
Totals 450.0 300.0 150.0 1050.0 400.0 650.0
Or, if you are looking for the percentage change with pct_change()
:
data = {'column_names':['foo_1','foo_2','bar_1','bar_2'], 'Totals':[1050,400,450,300]}
df = pd.DataFrame(data)
s = df['column_names'].str.split('_').str[0]
s1 = df.groupby(s)['Totals'].apply(lambda x: x.pct_change())
s2 = df.groupby(s)['column_names'].apply(lambda x: x.str.split('_').str[0] + '__% Diff')
df = (df.append(pd.concat([s1,s2], axis=1).dropna())
.sort_values('column_names').set_index('column_names').T)
df.columns = df.columns.str.replace('__', ' ') # the column name was chosen carefully to preserve order and then you can simply replace __ with a space
df
Out[2]:
column_names bar_1 bar_2 bar % Diff foo_1 foo_2 foo % Diff
Totals 450.0 300.0 -0.333333 1050.0 400.0 -0.619048