So this is my DataFrame:
Basically what's in there:
- There is
Query-Date
column, and foreachQuery Date
date, there are 30Check-In
days forward, and foreachCheck-In
date there are 5 days forward.
Note: The time formart is Day/Month/Year
Note: The hotel name is the same for every row, only the Price
and Nights
columns are different ( and the dates )
- There is
Nights
column, it's basically subtraction betweenCheck-out
andCheck-In
An example for a DataFrame:
+------------+-----------+-----------+------------+-------+--------+
| Query-Date | Check-In | Check-Out | Hotel Name | Price | Nights |
+------------+-----------+-----------+------------+-------+--------+
| 1/1/2000 | 1/1/2000 | 2/1/2000 | HotelName1 | 10 | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | HotelName1 | 21 | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | ... | .. | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | ... | .. | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | ... | .. | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 2/1/2000 | 3/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | ... | | | | |
+------------+-----------+-----------+------------+-------+--------+
| | 30/1/2000 | 31/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 1/2/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 2/2/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/2/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/2/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| 2/1/2000 | 2/1/2000 | 2/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | ... | | | |
+------------+-----------+-----------+------------+-------+--------+
Now, in some rows, there are missing dates, so for example we could find something like this:
+------------+-----------+-----------+------------+-------+--------+
| 3/1/2000 | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | 4/1/2000 | 5/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 9/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
We can notice that for the Query-Date
which the value "3/1/2000" and for the Check-In
"3/1/2000" there are two missings: the Date "5/1/2000" ( 2 Nights ) and "8/1/2000" ( 5 Nights )
What I want is to add these days, with the same Hotel Name and with the price of the mean of the closest previous row with the same Nights
value with the closest forward row with the same Nights
value
But this thing is more complicated because the missing can be for a whole Query Date
or even few.
So basically I found several topics:
- https://stackoverflow.com/a/19324591
Basically they say that the Date should be the index, and then we can use the
pd.date_range
andreindex
, So what I did is to determine these 3 columns:Query-Date
,Check-In
,Check-Out
to be the index:
pd.set_index(['Query Date', 'Check-In', 'Check-Out'], inplace=True)
I also can find the min and max value of Query-Date, but I couldn't find a way to make a range for 3 columns.
- https://stackoverflow.com/a/44102947/11356272 In this topic there's a code to fill these null gaps the previous and forward one, but it's not really my case, because I don't need the LAST previous and forward rows for the mean, but the the LAST previous and forward rows with the same nights.
I need something like that in pseudo code:
query_date = min(pd['Query Date'])
while(query_date != max(pd['Query Date'])):
for(i in range(0, 30 + 1)):
for(j in range(0,5 + 1)):
if(check if row with Query-Date: query_date
and Check-In: query_date + (i days)
and Check-Out : query_date + (i + j days) is not exists):
add to pd new row with these Query-Date, Check-In, Check-out with price: (The last price value with the same + the forward price value with the same amount) / 2
Hopefully you guys could help me do that.