0

I have a dataframe from which, on two columns, I do some difference on dates:

difference=(df["date1"]-df["date2"]).dt.days

then I try to append it to existing dataframe, I get error messages. If I do:

df.assign(difference) 

i get:

TypeError: assign() takes 1 positional argument but 2 were given

if I do:

df["Diference value"]=difference

i get:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

in both cases last row is filled with NaN.

Anyway, I go along with this new dataframe, but when I try to groupby (that works fine) and get_group("Diference value") I get:

> --------------------------------------------------------------------------- KeyError                                  Traceback (most recent call
> last) <ipython-input-46-71486a5f3be6> in <module>
> ----> 1 dias=sectores.get_group("Difference value")
> 
> D:\ArchivosProgramas\Anaconda\envs\pandas_playground\lib\site-packages\pandas\core\groupby\groupby.py
> in get_group(self, name, obj)
>     685         inds = self._get_index(name)
>     686         if not len(inds):
> --> 687             raise KeyError(name)
>     688 
>     689         return obj._take_with_is_copy(inds, axis=self.axis)
> 
> KeyError: 'Difference value'

I don't know where the error starts and how to fix it. All I need is this dataframe with that new column and then do grouping normally. I´ve been all day long trying to solve it. Any help is appreciated. Thanks.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
John Doe
  • 105
  • 8
  • Does this answer your question? [Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes](https://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu) – Trenton McKinney Aug 20 '20 at 23:53

2 Answers2

0

Just this should do:

df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])
df['difference'] = (df['date1']-df['date2']).dt.days
print(df)

       date1      date2  difference
0 2020-02-28 2020-03-31         -32
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • Thank you. Im trying it now but i get the message: – John Doe Aug 21 '20 at 16:21
  • A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead, so even when I did the loc trick, warning message comes out. Now, the thing is I cant properly use group by when adressing that dataframe. Marks a key error. That is my main point, cant get past there. – John Doe Aug 21 '20 at 16:41
  • How are you getting the dataframe? Where and how are you populating it? I don't think the problem is with the date difference. Because this code works perfectly fine. – NYC Coder Aug 21 '20 at 16:44
  • Dataframe is from a public data base on covid19 of Mexico City (here it is in case you want to take a look: [link] https://datos.cdmx.gob.mx/explore/dataset/base-covid-sinave/table/) From there I select about four columns and try to do some analysis. No, date difference is not the problem, problem is i cant get group after groupby. – John Doe Aug 22 '20 at 17:58
0

See example below:


df.head()
           date1                   date2
0   2020-01-07 08:24:25     2020-07-28 01:34:44
1   2020-01-06 10:32:18     2020-03-21 17:13:07
2   2020-01-07 08:34:01     2020-03-21 17:13:09
3   2020-05-02 11:13:18     2020-07-18 21:57:11
4   2020-01-11 12:56:22     2020-04-02 21:28:15

#creating diff column:

df['diff']=(df["date1"]-df["date2"]).dt.days
df.head()

#it results on this:

           date1                    date2          diff
0   2020-01-07 08:24:25     2020-07-28 01:34:44     202
1   2020-01-06 10:32:18     2020-03-21 17:13:07     75
2   2020-01-07 08:34:01     2020-03-21 17:13:09     74
3   2020-05-02 11:13:18     2020-07-18 21:57:11     77
4   2020-01-11 12:56:22     2020-04-02 21:28:15     82
Mel
  • 311
  • 1
  • 9