How can I extract only the date (not the time) using Pandas' to_datetime
?
Let's say I have this string:
>>> date_string = "1975-02-23 02:58:41+00:00"
If I wanted just the year then this works well:
>>> import pandas as pd
>>> pd.to_datetime(date_string, format="%Y", exact=False)
Timestamp('1975-01-01 00:00:00')
But it seems this technique breaks down if I want say, the year and month, or the year, month, and day:
>>> pd.to_datetime(date_string, format="%Y-%m", exact=False)
Timestamp('1975-02-23 02:58:41+0000', tz='UTC')
>>> pd.to_datetime(date_string, format="%Y-%m-%d", exact=False)
Timestamp('1975-02-23 02:58:41+0000', tz='UTC')
Why is it that Pandas is extracting more than I wanted? How can I limit parsing to only what I specify?
I'm specifically interested in the parsing process. For example, let's say the string contained errors in the "hours" portion, and I didn't care about that. I would like to parse only the year-month-day portion.
>>> date_string_with_error = "1975-02-23 25:58:41+00:00"
>>> pd.to_datetime(date_string_with_error, format="%Y", exact=False)
Timestamp('1975-01-01 00:00:00')
>>> pd.to_datetime(date_string_with_error, format="%Y-%m", exact=False)
Traceback (most recent call last):
File "/Users/matthew/.local/share/virtualenvs/PyOgg-ewux8jXO/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2054, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/matthew/.local/share/virtualenvs/PyOgg-ewux8jXO/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 830, in to_datetime
result = convert_listlike(np.array([arg]), format)[0]
File "/Users/matthew/.local/share/virtualenvs/PyOgg-ewux8jXO/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 459, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/Users/matthew/.local/share/virtualenvs/PyOgg-ewux8jXO/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2059, in objects_to_datetime64ns
raise e
File "/Users/matthew/.local/share/virtualenvs/PyOgg-ewux8jXO/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2044, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 496, in pandas._libs.tslib.array_to_datetime
ValueError: time data 1975-02-23 25:58:41+00:00 doesn't match format specified