After having searched for similar questions I found out with this and this questions. Unfortunately neither of them works with me.
The first works on all the columns, the second does not work on my column of True
and False
and returns error (I also have not understood it completely).
Here's a description of the problem:
I am working with a dataframe of ~54k rows. Here's an example of 24 values:
+----+---------------------+---------------------+----------------------+--------------------+-------+
| | date | omegasr | omega | omegass | isday |
+----+---------------------+---------------------+----------------------+--------------------+-------+
| 1 | 2012-03-27 00:00:00 | -1.5707963267948966 | -3.32335035194977 | 1.5707963267948966 | False |
| 2 | 2012-03-27 01:00:00 | -1.5707963267948966 | -3.0615509641506207 | 1.5707963267948966 | False |
| 3 | 2012-03-27 02:00:00 | -1.5707963267948966 | -2.799751576351471 | 1.5707963267948966 | False |
| 4 | 2012-03-27 03:00:00 | -1.5707963267948966 | -2.5379521885523215 | 1.5707963267948966 | False |
| 5 | 2012-03-27 04:00:00 | -1.5707963267948966 | -2.2761528007531724 | 1.5707963267948966 | False |
| 6 | 2012-03-27 05:00:00 | -1.5707963267948966 | -2.014353412954023 | 1.5707963267948966 | False |
| 7 | 2012-03-27 06:00:00 | -1.5707963267948966 | -1.7525540251548732 | 1.5707963267948966 | False |
| 8 | 2012-03-27 07:00:00 | -1.5707963267948966 | -1.4907546373557239 | 1.5707963267948966 | True |
| 9 | 2012-03-27 08:00:00 | -1.5707963267948966 | -1.2289552495565745 | 1.5707963267948966 | True |
| 10 | 2012-03-27 09:00:00 | -1.5707963267948966 | -0.9671558617574253 | 1.5707963267948966 | True |
| 11 | 2012-03-27 10:00:00 | -1.5707963267948966 | -0.7053564739582756 | 1.5707963267948966 | True |
| 12 | 2012-03-27 11:00:00 | -1.5707963267948966 | -0.44355708615912615 | 1.5707963267948966 | True |
| 13 | 2012-03-27 12:00:00 | -1.5707963267948966 | -0.1817576983599767 | 1.5707963267948966 | True |
| 14 | 2012-03-27 13:00:00 | -1.5707963267948966 | 0.08004168943917273 | 1.5707963267948966 | True |
| 15 | 2012-03-27 14:00:00 | -1.5707963267948966 | 0.34184107723832213 | 1.5707963267948966 | True |
| 16 | 2012-03-27 15:00:00 | -1.5707963267948966 | 0.6036404650374716 | 1.5707963267948966 | True |
| 17 | 2012-03-27 16:00:00 | -1.5707963267948966 | 0.8654398528366211 | 1.5707963267948966 | True |
| 18 | 2012-03-27 17:00:00 | -1.5707963267948966 | 1.127239240635771 | 1.5707963267948966 | True |
| 19 | 2012-03-27 18:00:00 | -1.5707963267948966 | 1.3890386284349199 | 1.5707963267948966 | True |
| 20 | 2012-03-27 19:00:00 | -1.5707963267948966 | 1.6508380162340692 | 1.5707963267948966 | False |
| 21 | 2012-03-27 20:00:00 | -1.5707963267948966 | 1.9126374040332188 | 1.5707963267948966 | False |
| 22 | 2012-03-27 21:00:00 | -1.5707963267948966 | 2.174436791832368 | 1.5707963267948966 | False |
| 23 | 2012-03-27 22:00:00 | -1.5707963267948966 | 2.4362361796315177 | 1.5707963267948966 | False |
| 24 | 2012-03-27 23:00:00 | -1.5707963267948966 | 2.698035567430667 | 1.5707963267948966 | False |
+----+---------------------+---------------------+----------------------+--------------------+-------+
omega
is the solar hour angle in radians. It ranges from -pi/2 to +pi/2 for the hours 00:00 and 24:00 respectively. At midday its value is 0.
omegass
is the hour angle to which the sunset occurs. Due to the symmetry of the sun-earth system, omegasr = -omegass
. These values are constant along one day, but change for every day.
The column isday
is a result of a conditional expression: when omegasr < omega < omegasr
then it's day and further calculations can be made.
In order to do further calculations I need to associate for each hour the midpoint of the time span that the measure covers. So, for example, the midday measure was recorded at 12:00 but in order to represent all of that hour I want to have the hour angle of 12:30. Therefore I need a
omegam[i] = (omega[i],omega[i+1]).mean()
where i
represents the index.
But here a new problem arises: if the sunset occurs, let's say, at 6:40 am then the midpoint hour has to be calculated like this:
omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset
Thus the hourly radian angle will correspond to 6:50am. I created the column isday
to help perform this task, but unfortunately I can't really use it.
Thank you.
EDIT:
The solution proposed by @Mabel Villaba is not correct, for the new_omega
column only has sunrise and sunset values.
A coorect new_omega
column would be:
new_omega
...
7 #here the mean is between omegasr and omega[8], therefore this new_omega value can't have a correct value, according to the proposed solution.
8 -1.2289552495565745 # = omega[9]
9 omega[10] #
10 omega[11]
...
17 omega[18]
18 omega[19]
19 1.570796 #omegass
...
I hope that it is clear enough
EDIT2:
Thank you again, but the values are still not correct: the mean values are still calculated wrongly. I have calculated manually the correct values, I will post them here:
omegam
...
7 -1.530775
8 -1.359855
9 -1.098058
...
13 -0.05256705
...
19 1.47992
...
EDIT3:
I think the column df['isday']
obtained thanks to the boolean mask might be misleading.
In fact: the sunrise always occurs between two rows, let them be called omega1
and omega2
, whom belong to row1
and row2
respectively. The same happens with the sunset, but withomega3
and omega4
. What happens is that the correct omegam
of row1
is calculated as:
omegam(row1) = (omegasr + omega2)/2
but row1
hase a False
attribute in the isday
column.
For the sunset it's the opposite: occurring between row3
and row4
it is calculated as:
omegam(row3) = (omega3 + omegass)/2
and row3
has a True
attribute.