2

So this is my DataFrame:

Basically what's in there:

  • There is Query-Date column, and foreach Query Date date, there are 30 Check-In days forward, and foreach Check-In date there are 5 days forward.

Note: The time formart is Day/Month/Year

Note: The hotel name is the same for every row, only the Price and Nights columns are different ( and the dates )

  • There is Nights column, it's basically subtraction between Check-out and Check-In

An example for a DataFrame:

+------------+-----------+-----------+------------+-------+--------+
| Query-Date | Check-In  | Check-Out | Hotel Name | Price | Nights |
+------------+-----------+-----------+------------+-------+--------+
| 1/1/2000   | 1/1/2000  | 2/1/2000  | HotelName1 | 10    | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/1/2000  | HotelName1 | 21    | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  | ...        | ..    | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  | ...        | ..    | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  | ...        | ..    | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 2/1/2000  | 3/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 3/1/2000  | 4/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 8/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | ...       |           |            |       |        |
+------------+-----------+-----------+------------+-------+--------+
|            | 30/1/2000 | 31/1/2000 |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 1/2/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 2/2/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/2/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/2/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
| 2/1/2000   | 2/1/2000  | 2/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 3/1/2000  | ...       |            |       |        |
+------------+-----------+-----------+------------+-------+--------+

Now, in some rows, there are missing dates, so for example we could find something like this:

+------------+-----------+-----------+------------+-------+--------+
| 3/1/2000   | 3/1/2000  | 4/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            | 4/1/2000  | 5/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 8/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 9/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+

We can notice that for the Query-Date which the value "3/1/2000" and for the Check-In "3/1/2000" there are two missings: the Date "5/1/2000" ( 2 Nights ) and "8/1/2000" ( 5 Nights )

What I want is to add these days, with the same Hotel Name and with the price of the mean of the closest previous row with the same Nights value with the closest forward row with the same Nights value

But this thing is more complicated because the missing can be for a whole Query Date or even few.

So basically I found several topics:

  • https://stackoverflow.com/a/19324591 Basically they say that the Date should be the index, and then we can use the pd.date_range and reindex, So what I did is to determine these 3 columns: Query-Date, Check-In, Check-Out to be the index:

pd.set_index(['Query Date', 'Check-In', 'Check-Out'], inplace=True)

I also can find the min and max value of Query-Date, but I couldn't find a way to make a range for 3 columns.

  • https://stackoverflow.com/a/44102947/11356272 In this topic there's a code to fill these null gaps the previous and forward one, but it's not really my case, because I don't need the LAST previous and forward rows for the mean, but the the LAST previous and forward rows with the same nights.

I need something like that in pseudo code:

query_date = min(pd['Query Date'])
while(query_date != max(pd['Query Date'])):
   for(i in range(0, 30 + 1)):
       for(j in range(0,5 + 1)):
           if(check if row with Query-Date: query_date
                            and Check-In: query_date + (i days)
                            and Check-Out : query_date + (i + j days) is not exists):
                  add to pd new row with these Query-Date, Check-In, Check-out with price: (The last price value with the same + the forward price value with the same amount) / 2

Hopefully you guys could help me do that.

Dorki
  • 1,021
  • 2
  • 8
  • 23

0 Answers0