Let's say I have a dataframe with two datetime columns and I want to analyze the difference between them:
import pandas as pd
csv = [
['2019-08-03 00:00:00', '2019-08-01 15:00:00', 4],
['2019-08-03 00:00:00', '2019-08-01 10:00:00', 6],
['2019-08-03 00:00:00', '2019-08-01 16:00:00', 8],
['2019-08-04 00:00:00', '2019-08-02 19:00:00', 3],
['2019-08-04 00:00:00', '2019-08-02 13:00:00', 4],
['2019-08-04 00:00:00', '2019-08-02 11:00:00', 5]
]
df = pd.DataFrame(csv, columns=['delivery_date', 'dispatch_date', 'order_size'])
df['delivery_date'] = pd.to_datetime(df['delivery_date'])
df['dispatch_date'] = pd.to_datetime(df['dispatch_date'])
df['transit_time'] = (df['delivery_date']-df['dispatch_date'])
df = df.set_index(['delivery_date','transit_time'])
Ok so now we have something like that:
dispatch_date order_size
delivery_date transit_time
2019-08-03 1 days 09:00:00 2019-08-01 15:00:00 4
1 days 14:00:00 2019-08-01 10:00:00 6
1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
1 days 11:00:00 2019-08-02 13:00:00 4
1 days 13:00:00 2019-08-02 11:00:00 5
Let's say for example that, for each delivery date, I want to know which delivery was the fastest (shortest delivery time). I want to save the result to a new dataframe with all the columns from the original dataframe. So I iterate like this:
delivery_dates = df.index.get_level_values(0).unique()
df_ouput = pd.DataFrame()
for date in delivery_dates:
df_analyzed = df.loc[(date, )].sort_index()
df_result = df_analyzed.iloc[[df_analyzed.index.get_loc(0, method='nearest')]]
df_result.loc[:,'delivery_date'] = date
df_ouput = df_ouput.append(df_result)
df_ouput = df_ouput.reset_index().set_index(['delivery_date'])
And the result is correct:
transit_time dispatch_date order_size
delivery_date
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
But I get the warning:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
And I don't know why because I am already using the ".loc" method for assignation:
df_result.loc[:,'delivery_date'] = date
But I can't get rid of the warning, so I came to this rare solution:
delivery_dates = df.index.get_level_values(0).unique()
df_ouput = pd.DataFrame()
for date in delivery_dates:
df_analyzed = df.loc[(date, )].sort_index()
df_result = df_analyzed.iloc[[df_analyzed.index.get_loc(0, method='nearest')]]
df_result_2 = df_result.copy()
df_result_2.loc[:,'delivery_date'] = date
df_ouput = df_ouput.append(df_result_2)
df_ouput = df_ouput.reset_index().set_index(['delivery_date'])
If a make a copy, then no warning is displayed. But why? Is there a better way to do what I want?