0

I have a dataframe with two datetime columns and a third column with a numeric value. Here is an example:

2019-01-01 00:00:00 2019-12-31 00:00:00 118433.0
2020-01-01 00:00:00 2020-12-31 00:00:00 120087.0
2021-01-01 00:00:00 2021-06-30 00:00:00 63831.0
2021-07-01 00:00:00 2021-12-31 00:00:00 63089.0
2022-01-01 00:00:00 2022-06-30 00:00:00 60753.0
2022-07-08 00:00:00 2022-11-30 00:00:00 9067.17

As you can see while 2019 and 2020 are full years, the other rows describe different time spans.

I need to sum all the numeric values pertaining to the same year to get something like:

2019 118422.0
2020 120087.0
2021 123842.0
2022 9067.0

The from-to dates on every row always fall in the same year.

Any given year does not have to be a full year.

I'd love to avoid simple iterations and learn a properly pythonic way of achieving this (list comprehension / vectorization).

Thank you

Robert Alexander
  • 875
  • 9
  • 24
  • Whats happens if not full year? – jezrael Dec 13 '22 at 14:03
  • Are there same years in both columns? – jezrael Dec 13 '22 at 14:08
  • @mozway - so extreme closing? 2 different solutions together? :( – jezrael Dec 13 '22 at 14:12
  • 1
    Let's be honest this is not a tricky question: `df.groupby(pd.to_datetime(df[0]).dt.year).sum(numeric_only=True)` is it really benefiting the Q/A database to add such a poor case? There are dozens of groupby.sum on the site. I'm sure there is an exact duplicate somewhere – mozway Dec 13 '22 at 14:14
  • Plus, I'm sure that if **I** had answered it, you would have found a duplicate :p – mozway Dec 13 '22 at 14:16
  • @mozway - I still thinking if same year in both or not – jezrael Dec 13 '22 at 14:16
  • @mozway - Absolutely not, not 100% dupe. – jezrael Dec 13 '22 at 14:16
  • If this was the case, then the question is too unclear, also a reason to close. – mozway Dec 13 '22 at 14:17
  • @mozway - yop, it is unclear, agree. So ask in comments. – jezrael Dec 13 '22 at 14:17
  • Sorry for not having specified better. a) As fas as I can see in the real data the two date columns are always in the same year. b) if not a full year, as in the example data and desired result for 2022, I just get the sum of whatever period falls in said year. Hope my wobbly english can be understood. – Robert Alexander Dec 13 '22 at 14:22
  • 1
    @RobertAlexander - OK, so working well `df.groupby(pd.to_datetime(df[0]).dt.year).sum(numeric_only=True)`, right? – jezrael Dec 13 '22 at 14:23
  • That gives a small error, but this works: print(df.groupby(pd.to_datetime(df["dal"]).dt.year).sum(numeric_only=True)) thanks a lot – Robert Alexander Dec 13 '22 at 15:07

0 Answers0