2

I am trying to subtract every row from every other row within the same dataframe. I have seen this. But the code there does not work for multi index.

Here is the structure of dataframe I have

                 a  c  z
index name          
0       foo      15  8  0
        bar      4   1  0
        baz      7   2  0
        toto     12  3  0
        titi     5   0  0
1       foo      8   6  0
        bar      4   1  0
        baz      6   3  0
        toto     5   1  0
        titi     6  0  0

The structure I want is something like the following:

index name1 name2  a    c  z
0      foo   bar   11   7  0
       foo   baz   8    6  0
       bar   baz   -3  -1  0

I've tried 2 things. First one does not keep the name1, name2, and which is essentially the same solution in the link.

The second works but takes hours to compute:

def df_diff(newticker: pd.DataFrame):
    times = newticker.index.get_level_values("index")
    timesu = times.unique()
    symbols = newticker.index.get_level_values("name")
    symbolsu = symbols.unique()
    symb_combination = list(itertools.permutations(symbolsu, 2))
    tuple_list = []
    #
    for timet in timesu:
        for sym in symb_combination:
            tuple_list.append((timet, sym[0], sym[1]))
    #
    mindex = pd.MultiIndex.from_tuples(tuple_list,
                                       names=["index", "name1", "name2"])
    cols = newticker.columns
    #
    dfdiff = pd.DataFrame(columns=cols,
                          index=mindex)
    #
    for symt in symb_combination:
        sym1 = symt[0]
        sym2 = symt[1]
        sym1df = newticker.xs(sym1, level='name')
        sym2df = newticker.xs(sym2, level='name')
        symdiff = sym1df.values - sym2df.values
        dfdiff.loc[(slice(None), sym1, sym2), :] = symdiff
    #
    return dfdiff

I am sure there is a more elegant way to do what I want. I would appriciate any help.

Kaan E.
  • 515
  • 4
  • 16

1 Answers1

1

You have a well constructed dataframe, what we need to do is just merge , then groupby

df=newticker.reset_index()# using the index for merge key 

newdf=df.merge(df,on='index')
newdf=newdf.loc[lambda x:['name_x']!=x['name_y']]
newdf=newdf.set_index(['index','name_x','name_y'])

-newdf.groupby(newdf.columns.str.split('_').str[0],
           axis=1).diff().dropna(1)

Out[90]: 
                      a_y  c_y  z_y
index name_x name_y                
0     foo    bar     11.0  7.0 -0.0
             baz      8.0  6.0 -0.0
             toto     3.0  5.0 -0.0
             titi    10.0  8.0 -0.0
      bar    foo    -11.0 -7.0 -0.0
             baz     -3.0 -1.0 -0.0
             toto    -8.0 -2.0 -0.0
             titi    -1.0  1.0 -0.0
      baz    foo     -8.0 -6.0 -0.0
             bar      3.0  1.0 -0.0
             toto    -5.0 -1.0 -0.0
             titi     2.0  2.0 -0.0
      toto   foo     -3.0 -5.0 -0.0
             bar      8.0  2.0 -0.0
             baz      5.0  1.0 -0.0
             titi     7.0  3.0 -0.0
      titi   foo    -10.0 -8.0 -0.0
             bar      1.0 -1.0 -0.0
             baz     -2.0 -2.0 -0.0
             toto    -7.0 -3.0 -0.0
1     foo    bar      4.0  5.0 -0.0
             baz      2.0  3.0 -0.0
             toto     3.0  5.0 -0.0
             titi     2.0  6.0 -0.0
      bar    foo     -4.0 -5.0 -0.0
             baz     -2.0 -2.0 -0.0
             toto    -1.0 -0.0 -0.0
             titi    -2.0  1.0 -0.0
      baz    foo     -2.0 -3.0 -0.0
             bar      2.0  2.0 -0.0
             toto     1.0  2.0 -0.0
             titi    -0.0  3.0 -0.0
      toto   foo     -3.0 -5.0 -0.0
             bar      1.0 -0.0 -0.0
             baz     -1.0 -2.0 -0.0
             titi    -1.0  1.0 -0.0
      titi   foo     -2.0 -6.0 -0.0
             bar      2.0 -1.0 -0.0
             baz     -0.0 -3.0 -0.0
             toto     1.0 -1.0 -0.0
Kaan E.
  • 515
  • 4
  • 16
BENY
  • 317,841
  • 20
  • 164
  • 234