0

I am trying to demonstrate what is my problem. I really do not understand, why PyNative <class 'datetime.datetime'> object is replaced with Pandas custom object <class 'pandas._libs.tslibs.timestamps.Timestamp'>.

import typing
from dateutil.parser import parse

def _normalize_users_dataframe(row: pd.core.series.Series) -> pd.core.series.Series:
    last_seen: typing.Union[str, datetime.datetime] = row.get('last_seen', '')
    if last_seen:
        last_seen = parse(last_seen)
        row['last_seen'] = last_seen
        print(row['last_seen'][0].__class__.__mro__) # This shows me that, it is <class 'datetime.datetime'> object, which is PyNative datetime.
    return row

def process_users_dataframe(filepath: str) -> pd.core.frame.DataFrame:
    df: pd.core.frame.DataFrame = pd.read_csv(filepath, sep='\t')
    df.rename(columns=mapping, inplace=True)
    df.replace({np.nan: None}, inplace=True)
    df = df.apply(_normalize_users_dataframe, axis=1)
    print(row['last_seen'][0].__class__.__mro__) # This shows me that, it is <class 'pandas._libs.tslibs.timestamps.Timestamp'>, which is `Pandas` specific object.
    return df


def main() -> None:
    process_users_dataframe('<dir>')

Inside normalize_users_dataframe() function, when I am trying to print last_seen column series, it shows me that dtype is <class 'datetime.datetime'>, which is fine, but after run apply() method on DataFrame which returns new DataFrame object,last_seen dtype became <class 'pandas._libs.tslibs.timestamps.Timestamp'>.

How this happens ? Maybe deep implementation detail ?

shzetb
  • 13
  • 2
  • https://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64 Maybe the second answer in this question will help? – Chris Jun 07 '21 at 15:39
  • @Chris Thank you for reply, I can fix this but question is why I am getting this behaviour. – shzetb Jun 07 '21 at 15:43
  • why would you use `datetime.datetime` in the first place? Note that pandas will always try to use its own datetime (pd.Timestamp), unless you have a mixed datatype Series. – FObersteiner Jun 07 '21 at 15:47
  • [Pandas supports](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) the `numpy.timedelta64` and `numpy.datetime64` dtypes for datetime operations. It is not built upon the standard library, and there are many inconsistencies between the two (mostly with timezone handling). Really, if working with pandas you shouldn't use `datetime` – ALollz Jun 07 '21 at 15:47

0 Answers0