14

I have a Pandas dataframe and I need to convert a column with dates to int but unfortunately all the given solutions end up with errors (below)

test_df.info()

<class 'pandas.core.frame.DataFrame'>
Data columns (total 4 columns):
Date        1505 non-null object
Avg         1505 non-null float64
TotalVol    1505 non-null float64
Ranked      1505 non-null int32
dtypes: float64(2), int32(1), object(1) 

sample data:

    Date        Avg             TotalVol  Ranked
0   2014-03-29  4400.000000     0.011364    1
1   2014-03-30  1495.785714     4.309310    1
2   2014-03-31  1595.666667     0.298571    1
3   2014-04-01  1523.166667     0.270000    1
4   2014-04-02  1511.428571     0.523792    1

I think that I've tried everything but nothing works

test_df['Date'].astype(int):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'datetime.date'

test_df['Date']=pd.to_numeric(test_df['Date']):

TypeError: Invalid object type at position 0

test_df['Date'].astype(str).astype(int):

ValueError: invalid literal for int() with base 10: '2014-03-29'

test_df['Date'].apply(pd.to_numeric, errors='coerce'):

Converts the entire column to NaNs

jabba
  • 503
  • 2
  • 6
  • 14
  • What are you hoping to get out of converting dates to ints? Days since 1/1/1970? The integer representation of the month? The string representation without hyphens all concatenated? How can pandas know what you want out of this operation if we don't? – TayTay Jun 14 '18 at 18:26
  • What int do you expect? For instance, what should `2014-03-29`, and what should `2014-03-30` be? – sacuL Jun 14 '18 at 18:26
  • The expected output is a string representation without hyphens all concatenated – jabba Jun 14 '18 at 18:42

3 Answers3

18

The reason why test_df['Date'].astype(int) gives you an error is that your dates still contain hyphens "-". First suppress them by doing test_df['Date'].str.replace("-",""), then you can apply your first method to the resulting series. So the whole solution would be :

test_df['Date'].str.replace("-","").astype(int)

Note that this won't work if your "Date" column is not a string object, typically when Pandas has already parsed your series as TimeStamp. In this case you can use :

test_df['Date'].dt.strftime("%Y%m%d").astype(int)
Neroksi
  • 1,301
  • 1
  • 12
  • 20
11

Looks like you need pd.to_datetime().dt.strftime("%Y%m%d").

Demo:

import pandas as pd
df = pd.DataFrame({"Date": ["2014-03-29", "2014-03-30", "2014-03-31"]})
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime("%Y%m%d")
print( df )

Output:

       Date
0  20140329
1  20140330
2  20140331
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

This should work

df['Date'] = pd.to_numeric(df.Date.str.replace('-',''))
print(df['Date'])
0    20140329
1    20140330
2    20140331
3    20140401
4    20140402
msolomon87
  • 56
  • 1
  • 6