0

I have a dataframe which looks like this:

    date        day      normalized_returns
0   2020-01-02  Thursday    
1   2020-01-03  Friday      0.4707137200215769
2   2020-01-06  Monday      0.23570968223068864
3   2020-01-07  Tuesday    -0.001668590491460948
4   2020-01-08  Wednesday   0.22958625059099985
5   2020-01-09  Thursday   -0.08590873731461762
6   2020-01-10  Friday     -0.022114285207011284
7   2020-01-13  Monday      0.051517831335272615
8   2020-01-14  Tuesday     0.0750307910411988
9   2020-01-15  Wednesday   0.07460670647513437
10  2020-01-16  Thursday   -0.04093466297777635
11  2020-01-17  Friday     -0.020633373620607427
12  2020-01-21  Tuesday     0.21581535779928024
13  2020-01-22  Wednesday  -0.07193613192906866
14  2020-01-23  Thursday    0.06265156014426786
15  2020-01-24  Friday     -0.040948868344091706
16  2020-01-27  Monday      0.4539355576206815
17  2020-01-28  Tuesday    -0.1635280764724063
18  2020-01-29  Wednesday  -0.0851700137388777
19  2020-01-30  Thursday    0.20162612444937758

With that, I'm trying to create a dataframe where the columns are day and the values under each day are the normalized returns associated with each day, such as in this format:

          Monday  Tuesday Wednesday  Thursday  Friday   
returns   0.034   0.099   0.02       0.007    -0.007
returns   0.283   0.115   0.025      0.009    -0.04
returns   0.253   0.044   0.019      0.011     0.045
  ...      ...     ...     ...        ...       ...
  ...      ...     ...     ...        ...       ...

I've tried using pivot table, but that returns a mean aggregation of the data in the form of:

            Monday  Tuesday Wednesday Thursday Friday
   value    0.066   0.09    0.04      0.009    -0.015

However, this doesn't obtain the correct result.

I've looked at the question linked here:, but the dataframe in question has categorical row values to set as the index, where here I just have sequential dates.

I could attempt to make unique identifiers of each day as a count, but unsure how to do that.

Any help or advice to obtain the answer is greatly appreciated; thank you!

birdman
  • 249
  • 1
  • 13
  • what's the index of your desired output? in fact, what's the desired output from the given input? –  Apr 14 '22 at 17:18
  • The index would just be the count of total values (0, 1, 2, 3.... n) @enke – birdman Apr 14 '22 at 17:21
  • I think your case is Q10 in the duplicate. If it doesn't fit, please edit your question to be more specific. –  Apr 14 '22 at 17:24

1 Answers1

0

Does this work:

years = df.date.dt.year.rename('year')
weeks = df.date.dt.isocalendar().week.rename('week')
df.set_index([years, weeks, 'day']).normalized_returns.unstack()

day          Friday    Monday  Thursday   Tuesday  Wednesday
year week                                                   
2020 1     0.470714       NaN       NaN       NaN        NaN
     2    -0.022114  0.235710 -0.085909 -0.001669   0.229586
     3    -0.020633  0.051518 -0.040935  0.075031   0.074607
     4    -0.040949       NaN  0.062652  0.215815  -0.071936
     5          NaN  0.453936  0.201626 -0.163528  -0.085170
piRSquared
  • 285,575
  • 57
  • 475
  • 624