1

Consider the following table 'df':

    date        sales  
0   2021-04-10  483  
1   2022-02-03  226  
2   2021-09-23  374  
3   2021-10-17  186  
4   2021-07-17   35

I would like to convert the column date that is currently a string to a date by using apply() and datetime.strptime().

I tried the following:

format_date = "%Y-%m-%d"
df["date_new"] = df.loc[:,"date"].apply(datetime.strptime,df.loc[:,"date"],format_date)

I have the following error.

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried with different syntaxes (with args and **kwds arguments of apply() but I am always getting an error) such as:

apply() takes exactly 2 arguments (3 given)

Can someone help me with the syntax ? Thank you.

rahlf23
  • 8,869
  • 4
  • 24
  • 54
Theo
  • 229
  • 1
  • 3
  • 12
  • If you need to convert string date to datetime, you just need df['date'] = pd.to_datetime(df['date']) – Vaishali Oct 03 '18 at 18:40
  • 1
    Why looking for the easy solution when you can make it hard... Thank you very much ! Still, do you see a way to combine apply() and datetime.strptime()? I am still curious to know why it does not work. – Theo Oct 03 '18 at 18:44
  • @Tousalouest see my answer below in regards to your question – Ravi Patel Oct 03 '18 at 18:45
  • 1
    @Tousalouest, for the sake of learning syntax using strptime, you can try df['date'].apply(lambda row: datetime.strptime(row, format_date)). Though you should use to_datetime – Vaishali Oct 03 '18 at 18:55
  • 1
    Also, don't think this is a duplicate. OP specifically asked "Syntax to use df.apply() with datetime.strptime", not "how do I convert to date format". this applies to answers suggesting use of pd.to_datetime as well – Ravi Patel Oct 03 '18 at 18:56
  • As mentionned below, I accept 'pd.to_datetime' as the ideal solution. However, rahlf23's answer is a better fit for my question. – Theo Oct 03 '18 at 19:00

2 Answers2

2

You should use pd.to_datetime():

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
rahlf23
  • 8,869
  • 4
  • 24
  • 54
1

You can do this:

df['date_new'] = df['date'].map(lambda date_string: datetime.strptime(date_string, format_string))

Since you are only operating on, and require data from a single column, you should use .map instead of .apply which will give you the entire row/column at once.

If you must use apply:

df['date_new'] = df.apply(lambda row: datetime.strptime(row['date'], format_string), axis=1)

The key here is axis=1, so you go row-wise

Ravi Patel
  • 346
  • 2
  • 8
  • 3
    No, you shouldn't. You should use the `to_datetime` pandas method. – roganjosh Oct 03 '18 at 18:47
  • Would not be better to use applymap because I work inside a dataframe and not a serie ? – Theo Oct 03 '18 at 18:47
  • @roganjosh, agreed, but Tousalouest asked in comment how to implement with datetime.strptime – Ravi Patel Oct 03 '18 at 18:52
  • 2
    The functions `apply` and `applymap` are executed as python code because you're running a variable function. Other pandas built-in are written in C and therefore are faster. You should use pandas built-in (that is also the case for any spark... scikit... numpy... etc...) as much as possible to increase performance. :) – IMCoins Oct 03 '18 at 18:52
  • @Tousalouest, applymap operates on each element in dataframe, while apply operates along an axis. You only want to use information from 1 column, so you can make it a series (`df['data'].map`), apply on each row (my `apply` example), or applymap assuming a dataframe (must be `df[['date']].applymap`) – Ravi Patel Oct 03 '18 at 18:53
  • Thank you for your answer. I will use pd.to_datetime() (and accept it as solution) because it is the optimal solution but your answer better fit my question. – Theo Oct 03 '18 at 18:55