I am trying to convert a time series data from long to wide format. The data is given below as follows.
+======+==========+======+======+
| Name | Date | Val1 | Val2 |
+======+==========+======+======+
| A | 1/1/2018 | 1 | 2 |
+------+----------+------+------+
| B | 1/1/2018 | 2 | 3 |
+------+----------+------+------+
| C | 1/1/2018 | 3 | 4 |
+------+----------+------+------+
| D | 1/4/2018 | 4 | 5 |
+------+----------+------+------+
| A | 1/4/2018 | 5 | 6 |
+------+----------+------+------+
| B | 1/4/2018 | 6 | 7 |
+------+----------+------+------+
| C | 1/4/2018 | 7 | 8 |
+------+----------+------+------+
My end goal is to create a pivot table for a date range of 01/01/2018
to 01/04/2018
. Since there is no value at dates 01/02/2018
or 01/03/2018
I would expect them to be filled with NaN
. For further simplicity this is what my final table would look like:
+---+---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
| | Val1.1/1/2018 | Val2.1/1/2018 | Val1.1/2/2018 | Val2.1/2/2018 | Val1.1/3/2018 | Val2.1/3/2018 | Val1.1/4/2018 | Val2.1/4/2018 |
+---+---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
| A | 1 | 2 | NULL | NULL | NULL | NULL | 5 | 6 |
| B | 2 | 3 | NULL | NULL | NULL | NULL | 6 | 7 |
| C | 3 | 4 | NULL | NULL | NULL | NULL | 7 | 8 |
| D | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 5 |
+---+---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
There are two steps as per my understanding to get above table.
First is to fill data in long format with dates not present in range between 01/01/2018
to 01/04/2018
i.e. is 01/02/2018
and 01/03/2018
.
Second and the final step would be to pivot data in wide format.
To achieve the first step I referred to this post.
As per the answer if there are similar dates in multiple rows df.reindex(date_range)
would raise following error, ValueError: cannot reindex from a duplicate axis
which is true and to overcome that I followed the following piece of code.
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df.set_index('Date', inplace = True)
date_range = pd.date_range('2018-01-01', '2018-01-04', freq='D')
df = df.loc(date_range)
The above code goves me the following error:
TypeError: unhashable type: 'DatetimeIndex'
I resolved the above issue using this line of code
df = df.loc[date_range,:]
While the I am able to get the desired long format but Python gives a warning as follows:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
The above warning, I think suggests that my way of achieving the long format table with missing dates is not correct, is that correct? If yes, than how should I go about doing it? Moreover, how would I reach to to wide format table which I want as my final table?
Edit:
I've reached to the pivot table containing dates only '01/01/2018'
and '01/04/2018'
. Following is the piece of code.
df1 = df.pivot_table(index='Name', columns='Date', aggfunc='sum')