I am trying to subtract every row from every other row within the same dataframe. I have seen this. But the code there does not work for multi index.
Here is the structure of dataframe I have
a c z
index name
0 foo 15 8 0
bar 4 1 0
baz 7 2 0
toto 12 3 0
titi 5 0 0
1 foo 8 6 0
bar 4 1 0
baz 6 3 0
toto 5 1 0
titi 6 0 0
The structure I want is something like the following:
index name1 name2 a c z
0 foo bar 11 7 0
foo baz 8 6 0
bar baz -3 -1 0
I've tried 2 things. First one does not keep the name1
, name2
, and which is essentially the same solution in the link.
The second works but takes hours to compute:
def df_diff(newticker: pd.DataFrame):
times = newticker.index.get_level_values("index")
timesu = times.unique()
symbols = newticker.index.get_level_values("name")
symbolsu = symbols.unique()
symb_combination = list(itertools.permutations(symbolsu, 2))
tuple_list = []
#
for timet in timesu:
for sym in symb_combination:
tuple_list.append((timet, sym[0], sym[1]))
#
mindex = pd.MultiIndex.from_tuples(tuple_list,
names=["index", "name1", "name2"])
cols = newticker.columns
#
dfdiff = pd.DataFrame(columns=cols,
index=mindex)
#
for symt in symb_combination:
sym1 = symt[0]
sym2 = symt[1]
sym1df = newticker.xs(sym1, level='name')
sym2df = newticker.xs(sym2, level='name')
symdiff = sym1df.values - sym2df.values
dfdiff.loc[(slice(None), sym1, sym2), :] = symdiff
#
return dfdiff
I am sure there is a more elegant way to do what I want. I would appriciate any help.