0

After having searched for similar questions I found out with this and this questions. Unfortunately neither of them works with me.

The first works on all the columns, the second does not work on my column of True and False and returns error (I also have not understood it completely).

Here's a description of the problem:

I am working with a dataframe of ~54k rows. Here's an example of 24 values:

+----+---------------------+---------------------+----------------------+--------------------+-------+
|    |        date         |       omegasr       |        omega         |      omegass       | isday |
+----+---------------------+---------------------+----------------------+--------------------+-------+
|  1 | 2012-03-27 00:00:00 | -1.5707963267948966 |    -3.32335035194977 | 1.5707963267948966 | False |
|  2 | 2012-03-27 01:00:00 | -1.5707963267948966 |  -3.0615509641506207 | 1.5707963267948966 | False |
|  3 | 2012-03-27 02:00:00 | -1.5707963267948966 |   -2.799751576351471 | 1.5707963267948966 | False |
|  4 | 2012-03-27 03:00:00 | -1.5707963267948966 |  -2.5379521885523215 | 1.5707963267948966 | False |
|  5 | 2012-03-27 04:00:00 | -1.5707963267948966 |  -2.2761528007531724 | 1.5707963267948966 | False |
|  6 | 2012-03-27 05:00:00 | -1.5707963267948966 |   -2.014353412954023 | 1.5707963267948966 | False |
|  7 | 2012-03-27 06:00:00 | -1.5707963267948966 |  -1.7525540251548732 | 1.5707963267948966 | False |
|  8 | 2012-03-27 07:00:00 | -1.5707963267948966 |  -1.4907546373557239 | 1.5707963267948966 | True  |
|  9 | 2012-03-27 08:00:00 | -1.5707963267948966 |  -1.2289552495565745 | 1.5707963267948966 | True  |
| 10 | 2012-03-27 09:00:00 | -1.5707963267948966 |  -0.9671558617574253 | 1.5707963267948966 | True  |
| 11 | 2012-03-27 10:00:00 | -1.5707963267948966 |  -0.7053564739582756 | 1.5707963267948966 | True  |
| 12 | 2012-03-27 11:00:00 | -1.5707963267948966 | -0.44355708615912615 | 1.5707963267948966 | True  |
| 13 | 2012-03-27 12:00:00 | -1.5707963267948966 |  -0.1817576983599767 | 1.5707963267948966 | True  |
| 14 | 2012-03-27 13:00:00 | -1.5707963267948966 |  0.08004168943917273 | 1.5707963267948966 | True  |
| 15 | 2012-03-27 14:00:00 | -1.5707963267948966 |  0.34184107723832213 | 1.5707963267948966 | True  |
| 16 | 2012-03-27 15:00:00 | -1.5707963267948966 |   0.6036404650374716 | 1.5707963267948966 | True  |
| 17 | 2012-03-27 16:00:00 | -1.5707963267948966 |   0.8654398528366211 | 1.5707963267948966 | True  |
| 18 | 2012-03-27 17:00:00 | -1.5707963267948966 |    1.127239240635771 | 1.5707963267948966 | True  |
| 19 | 2012-03-27 18:00:00 | -1.5707963267948966 |   1.3890386284349199 | 1.5707963267948966 | True  |
| 20 | 2012-03-27 19:00:00 | -1.5707963267948966 |   1.6508380162340692 | 1.5707963267948966 | False |
| 21 | 2012-03-27 20:00:00 | -1.5707963267948966 |   1.9126374040332188 | 1.5707963267948966 | False |
| 22 | 2012-03-27 21:00:00 | -1.5707963267948966 |    2.174436791832368 | 1.5707963267948966 | False |
| 23 | 2012-03-27 22:00:00 | -1.5707963267948966 |   2.4362361796315177 | 1.5707963267948966 | False |
| 24 | 2012-03-27 23:00:00 | -1.5707963267948966 |    2.698035567430667 | 1.5707963267948966 | False |
+----+---------------------+---------------------+----------------------+--------------------+-------+

omega is the solar hour angle in radians. It ranges from -pi/2 to +pi/2 for the hours 00:00 and 24:00 respectively. At midday its value is 0.

omegass is the hour angle to which the sunset occurs. Due to the symmetry of the sun-earth system, omegasr = -omegass. These values are constant along one day, but change for every day.

The column isday is a result of a conditional expression: when omegasr < omega < omegasr then it's day and further calculations can be made.

In order to do further calculations I need to associate for each hour the midpoint of the time span that the measure covers. So, for example, the midday measure was recorded at 12:00 but in order to represent all of that hour I want to have the hour angle of 12:30. Therefore I need a

omegam[i] = (omega[i],omega[i+1]).mean() 

where i represents the index.

But here a new problem arises: if the sunset occurs, let's say, at 6:40 am then the midpoint hour has to be calculated like this:

omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset

Thus the hourly radian angle will correspond to 6:50am. I created the column isday to help perform this task, but unfortunately I can't really use it.

Thank you.

EDIT:

The solution proposed by @Mabel Villaba is not correct, for the new_omega column only has sunrise and sunset values.

A coorect new_omega column would be:

 new_omega  
... 
7   #here the mean is between omegasr and omega[8], therefore this new_omega value can't have a correct value, according to the proposed solution.

8   -1.2289552495565745 # = omega[9]       
9   omega[10]  #                  
10  omega[11]
... 
17   omega[18] 
18   omega[19] 
19   1.570796  #omegass
...

I hope that it is clear enough

EDIT2:

Thank you again, but the values are still not correct: the mean values are still calculated wrongly. I have calculated manually the correct values, I will post them here:

     omegam

...
7    -1.530775
8    -1.359855
9    -1.098058
...
13   -0.05256705
...
19   1.47992
...

EDIT3:

I think the column df['isday'] obtained thanks to the boolean mask might be misleading.

In fact: the sunrise always occurs between two rows, let them be called omega1 and omega2, whom belong to row1 and row2 respectively. The same happens with the sunset, but withomega3 and omega4. What happens is that the correct omegam of row1 is calculated as:

omegam(row1) = (omegasr + omega2)/2

but row1 hase a False attribute in the isday column.

For the sunset it's the opposite: occurring between row3 and row4 it is calculated as:

omegam(row3) = (omega3 + omegass)/2

and row3 has a True attribute.

GGiacomo
  • 75
  • 1
  • 8

1 Answers1

1

EDIT

In the case you mention, it is a little more complicated but I think I came to a workaround. There is some misleading, since the operation at sunrise and sunset is not always done in the same direction.

Let us create two omegas, omega1 that does omegam[i] = 0.5 * (omega[i] + omegasr[i+1]) and another omega2 that does omegam[i] = 0.5 * (omega[i-1] + omegass[i]):

df['omega1'] = .5*((df['omega'] + df['omegasr'].shift(-1)))   
df['omega2'] = .5*((df['omega'].shift(1) + df['omegass']))

Then, we need to create a mask that tells us whether it is sunset or sunrise, or none of them:

df['mask'] =  (df['isday'] * 1).diff().bfill()

>> df[['date','mask', 'isday']]

                     date  mask  isday
0    2012-03-27 00:00:00    0.0  False
1    2012-03-27 01:00:00    0.0  False
2    2012-03-27 02:00:00    0.0  False
3    2012-03-27 03:00:00    0.0  False
4    2012-03-27 04:00:00    0.0  False
5    2012-03-27 05:00:00    0.0  False
6    2012-03-27 06:00:00    0.0  False
7    2012-03-27 07:00:00    1.0   True
8    2012-03-27 08:00:00    0.0   True
9    2012-03-27 09:00:00    0.0   True
10   2012-03-27 10:00:00    0.0   True
11   2012-03-27 11:00:00    0.0   True
12   2012-03-27 12:00:00    0.0   True
13   2012-03-27 13:00:00    0.0   True
14   2012-03-27 14:00:00    0.0   True
15   2012-03-27 15:00:00    0.0   True
16   2012-03-27 16:00:00    0.0   True
17   2012-03-27 17:00:00    0.0   True
18   2012-03-27 18:00:00    0.0   True
19   2012-03-27 19:00:00   -1.0  False
20   2012-03-27 20:00:00    0.0  False
21   2012-03-27 21:00:00    0.0  False
22   2012-03-27 22:00:00    0.0  False
23   2012-03-27 23:00:00    0.0  False

This way, df['mask']==1 corresponds to sunrise, df['mask']==-1 to sunset and df['mask']==0 corresponds to the rest.

Based on these conditions, we can create omegam:

df['omegam'] = df['omega'].rolling(2).mean() * (df['mask'] == 0) + \
               df['omega1'] * (df['mask']==1) + \
               df['omega2'] * (df['mask']==-1)

>> df[['date','omegam']]

                     date    omegam
0    2012-03-27 00:00:00        NaN
1    2012-03-27 01:00:00  -3.192451
2    2012-03-27 02:00:00  -2.930651
3    2012-03-27 03:00:00  -2.668852
4    2012-03-27 04:00:00  -2.407052
5    2012-03-27 05:00:00  -2.145253
6    2012-03-27 06:00:00  -1.883454
7    2012-03-27 07:00:00  -1.530775
8    2012-03-27 08:00:00  -1.359855
9    2012-03-27 09:00:00  -1.098056
10   2012-03-27 10:00:00  -0.836256
11   2012-03-27 11:00:00  -0.574457
12   2012-03-27 12:00:00  -0.312657
13   2012-03-27 13:00:00  -0.050858
14   2012-03-27 14:00:00   0.210941
15   2012-03-27 15:00:00   0.472741
16   2012-03-27 16:00:00   0.734540
17   2012-03-27 17:00:00   0.996340
18   2012-03-27 18:00:00   1.258139
19   2012-03-27 19:00:00   1.479917
20   2012-03-27 20:00:00   1.781738
21   2012-03-27 21:00:00   2.043537
22   2012-03-27 22:00:00   2.305336
23   2012-03-27 23:00:00        NaN

OLD SOLUTION:

As you mention, since omegasr = -omegass, then you could create a new column in your pandas based on the hour so you can get the omega you need for the mean operation (if sunrise (hour<12): omegasr, else: - omegasr):

df['new_omega'] = df.apply(lambda x: x['omegasr'] if pd.to_datetime(x['date']).hour < 12 else -x['omegasr'], axis=1).shift(-1)

>> df

                     date   omegasr     omega   omegass  isday  new_omega

1    2012-03-27 00:00:00  -1.570796 -3.323350  1.570796  False  -1.570796
2    2012-03-27 01:00:00  -1.570796 -3.061551  1.570796  False  -1.570796
3    2012-03-27 02:00:00  -1.570796 -2.799752  1.570796  False  -1.570796
4    2012-03-27 03:00:00  -1.570796 -2.537952  1.570796  False  -1.570796
5    2012-03-27 04:00:00  -1.570796 -2.276153  1.570796  False  -1.570796
6    2012-03-27 05:00:00  -1.570796 -2.014353  1.570796  False  -1.570796
7    2012-03-27 06:00:00  -1.570796 -1.752554  1.570796  False  -1.570796
8    2012-03-27 07:00:00  -1.570796 -1.490755  1.570796   True  -1.570796
9    2012-03-27 08:00:00  -1.570796 -1.228955  1.570796   True  -1.570796
10   2012-03-27 09:00:00  -1.570796 -0.967156  1.570796   True  -1.570796
11   2012-03-27 10:00:00  -1.570796 -0.705356  1.570796   True  -1.570796
12   2012-03-27 11:00:00  -1.570796 -0.443557  1.570796   True   1.570796
13   2012-03-27 12:00:00  -1.570796 -0.181758  1.570796   True   1.570796
14   2012-03-27 13:00:00  -1.570796  0.080042  1.570796   True   1.570796
15   2012-03-27 14:00:00  -1.570796  0.341841  1.570796   True   1.570796
16   2012-03-27 15:00:00  -1.570796  0.603640  1.570796   True   1.570796
17   2012-03-27 16:00:00  -1.570796  0.865440  1.570796   True   1.570796
18   2012-03-27 17:00:00  -1.570796  1.127239  1.570796   True   1.570796
19   2012-03-27 18:00:00  -1.570796  1.389039  1.570796   True   1.570796
20   2012-03-27 19:00:00  -1.570796  1.650838  1.570796  False   1.570796
21   2012-03-27 20:00:00  -1.570796  1.912637  1.570796  False   1.570796
22   2012-03-27 21:00:00  -1.570796  2.174437  1.570796  False   1.570796
23   2012-03-27 22:00:00  -1.570796  2.436236  1.570796  False   1.570796
24   2012-03-27 23:00:00  -1.570796  2.698036  1.570796  False        NaN

Data in 'new_omega' is shifted to comply with

omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset

Then, omegam can be obtained just by applying the mean to the columns omega and new_omega when the condition df['isday']==True is satisfied or the mean(omega[i], omega[i+1]) when df['isday']==False :

# Calculate the rolling mean with a window=2 and then shift backwards 
# (by default pd.rollling(2).mean() would result in 
# omegam[i]= mean(omega[i-1], omega[i]))


df['omegam'] = df['omega'].rolling(2).mean().shift(-1)


df['omegam'][df['isday']] = df[['omega', 'new_omega']][df['isday']].mean(axis=1).values

>> df['omegam']

1    -3.192451
2    -2.930651
3    -2.668852
4    -2.407052
5    -2.145253
6    -1.883454
7    -1.621654
8    -1.530775
9    -1.399876
10   -1.268976
11   -1.138076
12    0.563620
13    0.694519
14    0.825419
15    0.956319
16    1.087218
17    1.218118
18    1.349018
19    1.479917
20    1.781738
21    2.043537
22    2.305336
23    2.567136
24         NaN
Name: omegam, dtype: float64

Hope it serves.

Mabel Villalba
  • 2,538
  • 8
  • 19
  • 1
    This does not completely correct, I will edit the question with the `new_omega` that would be correct. – GGiacomo Sep 13 '18 at 13:18
  • Fixed it, now it should work they way you asked. Tricky question! – Mabel Villalba Sep 13 '18 at 14:05
  • Thank you but it is still not working correctly: I have done manually the calculations, and in your solution just `df['omegam'].iloc[19]` assumes the correct value. `df['omegam'].iloc[8]` has a correct value too, but it should be in position number 7 (sunrise occurs between rows 7 and 8, therefore the `omegam` in `df['omegam'].iloc[7]` is calculated between the sunrise and the beginning of the following hour. Also if you do the calculations for the two rows in which `omega` becomes positive, you would have an `df['omegam'].iloc[13] = -0.0525...` – GGiacomo Sep 13 '18 at 16:48
  • 1
    ok! The issue was with the calculation and sunrise and sunset, that is not done always in the same direction and also, that the rest of the time it is the mean of omega the one applied. – Mabel Villalba Sep 13 '18 at 20:09
  • This time it works! Thank you very much, I had no idea that it would have been so tricky! – GGiacomo Sep 13 '18 at 21:53