1

I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value. The reason for this is that my dataframe contains a column with dates, like:

  ID         Date         
519457    25/02/2020 10:03    
519462    25/02/2020 10:07     
519468    25/02/2020 10:12
 ...           ...

And Numpy requires the format to be floating point numbers, as so:

  ID         Date         
519457    43886.41875     
519462    43886.42153     
519468    43886.425 
 ...         ...      

How can I make this change without having to modify the spreadsheet itself? I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple. I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.

The code I am using is the following

xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")

df["E_t"]=df["Date"].diff()

Any help or pointers would be really appreciated!

PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.

enricw
  • 263
  • 4
  • 19

2 Answers2

2

You can convert the date into the Unix timestamp. In python, if you have a datetime object in UTC, you can the timestamp() to get a UTC timestamp. This function returns the time since epoch for that datetime object.

Please see an example below-

from datetime import timezone
dt = datetime(2015, 10, 19)
timestamp = dt.replace(tzinfo=timezone.utc).timestamp()
print(timestamp)

1445212800.0

Please check the datetime module for more info.

ravi
  • 6,140
  • 18
  • 77
  • 154
1

I think you need:

#https://stackoverflow.com/a/9574948/2901002
@rewritten to vectorized solution
def excel_date(date1):
    temp = pd.Timestamp(1899, 12, 30)    # Note, not 31st Dec but 30th!
    delta = date1 - temp
    return (delta.dt.days) + (delta.dt.seconds) / 86400

df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)

print (df)
       ID          Date
0  519457  43886.418750
1  519462  43886.421528
2  519468  43886.425000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252