2

I'm trying to understand how to add a row that contains a timestamp to a Pandas dataframe that has a column with a data type of datetime64[ns, UTC]. Unfortunately, when I add a row, the column datatype changes to object, which ends up breaking conversion to a R data frame via Rpy2.

Here are the interesting lines of code where I'm seeing the problem, with debug printing statements around it whose output I'll share as well. The variable observation is a simple python list whose first value is a timestamp. Code:

print('A: df.dtypes[0] = {}'.format(str(df.dtypes[0])))
print('observation[0].type = {}, observation[0].tzname() = {}'.format(str(type(observation[0])), observation[0].tzname()))
df.loc[len(df)] = observation
print('B: df.dtypes[0] = {}'.format(str(df.dtypes[0])))

Here is the output of the above code snippet:

A: df.dtypes[0] = datetime64[ns, UTC]
observation[0].type = <class 'datetime.datetime'>, observation[0].tzname() = UTC
B: df.dtypes[0] = object

What I'm observing is that the datatype of the column is being changed when I append the row. As far as I can tell, Pandas is adding the timestamp as an instance of . The rpy2 pandas2ri module seems to be unable to convert values of that class.

I've so far been unable to find an approach that lets me append a row to the data frame and preserve the column type for the timestamp column. Suggestions would be welcome.

==========================

Update

I've been able to work around the problem in a hacky way. I create a one-row temporary dataframe from the list of values, then set the types on the columns for this one-row dataframe. Then I append the row from this temporary dataframe to the one I'm working on. This is the only approach I was able to identify that preserves the column type of the dataframe I'm appending to. It's almost enough to make me pine for a strongly typed language.

I'd prefer a more elegant solution, so I'm leaving this open in case anyone can suggest one.

1 Answers1

0

Check this post for an answer, especially the answer by Wes McKinney:

Converting between datetime, Timestamp and datetime64

  • 1
    I read that post, and it doesn't seem to answer the question. If I convert the first argument of the observation list to a numpy.datetime64, I get the same behavior that the column datatype changes to "object" from "datetime64[ns, UTC]", and conversion of the dataframe to an R object fails because it cannot determine how to convert an object. – Eric Wittle May 18 '20 at 19:59