1

I dont know how to transform data in my column 'datetime' with this format:

2020-01-01T00:00:00.000+01:00

in to:

Jan-2020

I've tried with this:

works_data["datetime"] = pd.to_datetime(works_data["datetime"], utc=True).dt.strftime('%b-%Y')

but doens't work... it returns the date in that format but with one less month. For example, for 2020-01-01T00:00:00.000+01:00 returns Dec-2019 ... I dont know why....

Could anybody help me with this problem?

EDIT:

The actual code that is working.

import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
import datetime
import time

url = "https://apidatos.ree.es/es/datos/generacion/estructura-generacion?start_date=2020-01-01T00:00&end_date=2020-12-31T22:00&time_trunc=month"


response = requests.get(url)
data = response.json()
df = pd.DataFrame(data['included'])


works_data = pd.json_normalize(data=df['attributes'], record_path='values', 
                            meta=['title']) 

And if i print the works_data, it returns:

value   percentage  datetime    title
0   3.726053e+06    0.163889    2020-01-01T00:00:00.000+01:00   Hidráulica
1   2.837911e+06    0.139561    2020-02-01T00:00:00.000+01:00   Hidráulica
2   3.112718e+06    0.148076    2020-03-01T00:00:00.000+01:00   Hidráulica
3   2.861429e+06    0.163464    2020-04-01T00:00:00.000+02:00   Hidráulica
4   2.858050e+06    0.159065    2020-05-01T00:00:00.000+02:00   Hidráulica
... ... ... ... ...
107 3.504687e+04    0.002002    2020-04-01T00:00:00.000+02:00   Residuos renovables
108 3.665093e+04    0.002040    2020-05-01T00:00:00.000+02:00   Residuos renovables
109 4.231355e+04    0.002197    2020-06-01T00:00:00.000+02:00   Residuos renovables
110 4.313452e+04    0.001803    2020-07-01T00:00:00.000+02:00   Residuos renovables
111 3.159610e+04    0.003548    2020-08-01T00:00:00.000+02:00   Residuos renovables

Any idea how to continue to transform the datetime column?

rogarui
  • 97
  • 1
  • 9

2 Answers2

1

As Quang Hoang already mentioned, you are converting UTC+1 into UTC, which is one hour back. Ergo,pd.to_datetime("2020-01-01T00:00:00.000+01:00", utc=True) resolves to Timestamp('2019-12-31 23:00:00+0000', tz='UTC')

It seems like you don't want to convert the date into UTC, so don't set the corresponding parameter to True.

Converting Strings

import pandas as pd
pd.to_datetime("2020-01-01T00:00:00.000+01:00").strftime("%b-%Y")
>>> 'Jan-2020'

Alternatively, using only the datetime module:

from datetime import datetime as dt
dt.fromisoformat("2020-01-01T00:00:00.000+01:00").strftime("%b-%Y")
>>> 'Jan-2020'

Converting Series

If you want to convert the entire column of datetimes, either use list comprehension or Pandas Series.dt.strftime function.

import pandas as pd
dates = pd.Series(["2020-03-01T00:00:00.000+01:00", "2020-01-01T00:00:00.000+01:00", "2020-06-01T00:00:00.000+01:00"])
pd.to_datetime(dates).dt.strftime("%b-%Y")

>>> 0    Mar-2020
>>> 1    Jan-2020
>>> 2    Jun-2020
>>> dtype: object

The above approach will fail when you have different timezone offsets in your Series. You will receive a ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True. In such a case, you can use apply to remove the timezone information for each datetime or use a list comprehension as follows.

df["column"] = [pd.to_datetime(date).strftime("%b-%Y") for date in dates]
sabacherli
  • 180
  • 1
  • 6
  • Hi @sabacherli Thanks for answering!! If I don't set the utc=True, i have this error....Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True – rogarui Aug 11 '20 at 20:57
  • If I use your first option // `pd.to_datetime(works_data["datetime"]).strftime("%b-%Y")` I have this error.... 'Series' object has no attribute 'strftime' – rogarui Aug 11 '20 at 21:02
  • For more info, my 'datetime' column is a Dtype = object – rogarui Aug 11 '20 at 21:08
  • I have the same problem....`Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True` – rogarui Aug 12 '20 at 07:21
  • I just post all my code, could you please check it? – rogarui Aug 12 '20 at 07:44
  • `works_data["datetime"] = [pd.to_datetime(date).strftime("%b-%Y") for date in works_data["datetime"]]` seems to work. – sabacherli Aug 12 '20 at 07:56
1

For a fixed UTC offset: localize the date/time column to None before strftime:

pd.to_datetime("2020-01-01T00:00:00.000+01:00").tz_localize(None).strftime('%b-%Y')
Out[47]: 'Jan-2020'

See also my answer here to get more info what's happening in the background.


Since your input seems to contain mixed UTC offsets, conversion via to_datetime will yield a Series of datetime.datetime objects (dtype object, as opposed to dtype datetime64). You'll need to replace the tzinfo property with None:

pd.to_datetime(works_data["datetime"]).apply(lambda t: t.replace(tzinfo=None)).dt.strftime('%b-%Y')
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thanks @MrFruppes I've tried it but returns me this.... `index is not a valid DatetimeIndex or PeriodIndex` – rogarui Aug 12 '20 at 06:59
  • @rogarui: hm, seems like you want to apply this to your df's index and it wasn't casted `to_datetime` before? – FObersteiner Aug 12 '20 at 07:20
  • I don't know what is happening...i have created my df and when I try to transform the column returns me so many errors... do you prefer if i post the entire code? – rogarui Aug 12 '20 at 07:28
  • I just post all my code, could you please check it? @MrFuppes – rogarui Aug 12 '20 at 07:45
  • @rogarui - your data contains mixed UTC offsets. `pd.to_datetime` fails to infer the according `pytz.fixed_offset`s, so... time for a `lambda` :) (see the edit of my answer) – FObersteiner Aug 12 '20 at 08:42