-1

When I run the aggregation code below using groupby().sum(), my Date series' format seems wrong: it seems to return results with future dates after 27.08.2019 even though my series has no dates after that:

                 Date        Obs
8189          2019-08-24   6.597940
8190          2019-08-24   0.758000
8191          2019-08-25  18.260892
8192          2019-08-25   6.590545
8193          2019-08-25   2.919198
8194          2019-08-25   0.838000
8195          2019-08-26   0.798000
8196          2019-08-26   6.597977
8197          2019-08-26  18.010977
8198          2019-08-26   2.882872
8199          2019-08-27  17.941132
8200          2019-08-27   0.847250
8201          2019-08-27   2.864728
8202          2019-08-27   6.730443

This code:

fiyat_w = fiyat_w.groupby('Date').sum()

fiyat_w = fiyat_w.reset_index()

returns aggregated series (daily sums of daily sensor measurements) with observations on future dates:

2057          2019-12-01  27.980334
2058          2019-12-02  24.340758
2059          2019-12-03  21.063112
2060          2019-12-04  25.989285
2061          2019-12-05  27.839916
2062          2019-12-06  27.301501
2063          2019-12-07  27.049580
2064          2019-12-08  26.325082

fiyat_w['Date'].describe()

Out[59]: 
count                    2065
unique                   2065
top       2016-11-14 00:00:00
freq                        1
first     2014-01-01 00:00:00
last      2019-12-08 00:00:00
Name: DATE_OF_TRANSACTION, dtype: object

What is the problem?

smci
  • 32,567
  • 20
  • 113
  • 146
omzeybek
  • 305
  • 2
  • 14
  • 1
    **Because your dates were in `mm.dd` format, but you flipped months with days when you read them in wrong**. See the linked question about converting datetimes right; you need to pass an explicit format string to `to_datetime`, don't rely on the default behavior; `pandas.read_csv(..., daysfirst)` is known to be buggy esp. for mixed-format dates. – smci Sep 23 '19 at 03:05

2 Answers2

1

Check the indicated source rows, e.g. 2057 thru 2064, maybe they are just as you put it in your question.

It looks like the month and day part in your dates are in reversed order.

Look also in other rows, whether the "middle" part (month) is always between 1 and 12. If not, it confirms my suspicion.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • I think the problem is related to the issue you mentioned. Before aggregation, my data has no problem in corresponding values with my original dates. But after aggregation DateTime object's format distorts. Is it possible to keep same date-time format prior to format change? My data spans from 01.01.2014 to 28.08.2019 and it is impossible to have values from Dec 2019 in aggregated data. – omzeybek Sep 22 '19 at 04:08
0

I don't understand. It works fine for me.

df
Out[6]: 
     row        Date        Obs
0   8189  24/08/2019   6.597940
1   8190  24/08/2019   0.758000
2   8191  25/08/2019  18.260892
3   8192  25/08/2019   6.590545
4   8193  25/08/2019   2.919198
5   8194  25/08/2019   0.838000
6   8195  26/08/2019   0.798000
7   8196  26/08/2019   6.597977
8   8197  26/08/2019  18.010977
9   8198  26/08/2019   2.882872
10  8199  27/08/2019  17.941132
11  8200  27/08/2019   0.847250
12  8201  27/08/2019   2.864728
13  8202  27/08/2019   6.730443

df2=df.groupby('Date')['Obs'].sum()

df2
Out[8]: 
Date
24/08/2019     7.355940
25/08/2019    28.608635
26/08/2019    28.289826
27/08/2019    28.383553
Name: Obs, dtype: float64
Humi
  • 609
  • 5
  • 15