-1

The structure of my Multiindex dataframe looks like this:

                                  close       high        low       open  
   index = (timestamp,key)                                  
(2018-09-10 16:00:00, ask)       1.16023    1.16064    1.16007    1.16046
(2018-09-10 16:00:00, bid)       1.16009    1.16053    1.15992    1.16033
(2018-09-10 16:00:00, volume)  817.00000  817.00000  817.00000  817.00000

For each timestamp there are observartions for bid, ask and the volume.

I am trying to add to the second level of the index (i.e. [bid,ask,volume]) a "mid" observation by calculating the corresponding (bid + ask)/2.

My desired dataframe should then look like this

                                  close       high        low       open  
   index = (timestamp,key)                                  
(2018-09-10 16:00:00, ask)       1.16023    1.16064    1.16007    1.16046
(2018-09-10 16:00:00, bid)       1.16009    1.16053    1.15992    1.16033
(2018-09-10 16:00:00, volume)  817.00000  817.00000  817.00000  817.00000
(2018-09-10 16:00:00, mid)     1.16016      1.16059    1.15999    1.1604

What's the most efficient way to do this? Can this be done in place?

EDIT:

Printing out head of dataframe to see structure more clearly.

`bid_ask.head(5).to_dict()
Out[3]: 
{'close': {(Timestamp('2018-09-10 16:00:00'), 'ask'): 1.1602300000000001,
  (Timestamp('2018-09-10 16:00:00'), 'bid'): 1.1600900000000001,
  (Timestamp('2018-09-10 16:00:00'), 'volume'): 817.0,
  (Timestamp('2018-09-10 17:00:00'), 'ask'): 1.15977,
  (Timestamp('2018-09-10 17:00:00'), 'bid'): 1.15968},
 'high': {(Timestamp('2018-09-10 16:00:00'), 'ask'): 1.1606399999999999,
  (Timestamp('2018-09-10 16:00:00'), 'bid'): 1.1605300000000001,
  (Timestamp('2018-09-10 16:00:00'), 'volume'): 817.0,
  (Timestamp('2018-09-10 17:00:00'), 'ask'): 1.16039,
  (Timestamp('2018-09-10 17:00:00'), 'bid'): 1.16029},
 'low': {(Timestamp('2018-09-10 16:00:00'), 'ask'): 1.1600699999999999,
  (Timestamp('2018-09-10 16:00:00'), 'bid'): 1.1599200000000001,
  (Timestamp('2018-09-10 16:00:00'), 'volume'): 817.0,
  (Timestamp('2018-09-10 17:00:00'), 'ask'): 1.1596200000000001,
  (Timestamp('2018-09-10 17:00:00'), 'bid'): 1.1595299999999999},
 'open': {(Timestamp('2018-09-10 16:00:00'), 'ask'): 1.16046,
  (Timestamp('2018-09-10 16:00:00'), 'bid'): 1.1603300000000001,
  (Timestamp('2018-09-10 16:00:00'), 'volume'): 817.0,
  (Timestamp('2018-09-10 17:00:00'), 'ask'): 1.1601900000000001,
  (Timestamp('2018-09-10 17:00:00'), 'bid'): 1.1600999999999999}}
 `
antifragile
  • 83
  • 1
  • 8
  • I posted an answer as you're a new user so many find some existing solutions confusing. Hopefully you can see how this question is very similar to the following https://stackoverflow.com/a/24918803/4013571 I would recommend seeing if you can follow that answer as well – Alexander McFarlane Sep 11 '18 at 21:47
  • It isn't easy to figure out the structure of your frame from this paste out here. Please run `df.head(5).to_dict()` and paste the output in your question. – cs95 Sep 11 '18 at 21:52

1 Answers1

0

I am not entirely sure how your DataFrame is structured but this is the essence

df.loc[('2018-09-10 16:00:00', 'mid'), :] = [1.16016, 1.16059, 1.15999 , 1.1604]

All you need to do is use df.loc and supply a new tuple for the MultiIndex

In my guess I assumed your new MultiIndex entry was ('2018-09-10 16:00:00', 'mid')

Example

In [353]: ref

Out[353]:
       Names  Values
  idx2
1 one      A       5
2 two      B      10

In [354]: ref.loc[(3, 'three'), :] = ['C', 15]

In [355]: ref
Out[355]:
        Names  Values
  idx2
1 one       A     5.0
2 two       B    10.0
3 three     C    15.0
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
  • thanks Alex. Is there a way to do this for every value in the Timestamp level of the index, i.e. something along the lines of bid_ask.loc[(:,"mid), :] = 0.5 * ( bid_ask.loc[(:,"bid), :] + bid_ask.loc[(:,"ask), :] ) . I am not sure how slicing works on an Multiindex object. – antifragile Sep 11 '18 at 22:19