0

I'm trying to add a column to a dataframe based on another existing columns

the dataframe is in the following format

col1         col2 
2017-02-1    2017-03-03
2017-02-22   2017-03-06


from datetime import datetime
date_format = "%Y-%m-%d"
df['TimeConsumed']=df['col2'].apply(lambda x: (datetime.strptime(x,date_format)-datetime.strptime(df['col1'],date_format)).days)

run the above and it keeps getting

TypeError: must be string, not Series 

anyone please a little help ?

ikel
  • 1,790
  • 6
  • 31
  • 61
  • 1
    Possible duplicate of [Add column with number of days between dates in DataFrame pandas](http://stackoverflow.com/questions/22132525/add-column-with-number-of-days-between-dates-in-dataframe-pandas) – maxymoo Mar 14 '17 at 01:15

1 Answers1

1

That error happens because you try to do strptime on Series, which only support string:

datetime.strptime(df['col1'], date_format)

I think you want to subtract on each row, then you need to apply on row, other than on one column, like below:

import pandas as pd
from datetime import datetime


def substract(df):
    date_format = "%Y-%m-%d"
    return (datetime.strptime(df['col2'],date_format)-    datetime.strptime(df['col1'],date_format)).days

if __name__ == '__main__':

    df = pd.DataFrame([{'col1':'2017-02-01','col2':'2017-03-03'},{'col1':'2017-02-22','col2':'2017-03-06'}])
    print df

    #date_format = "%Y-%m-%d"
    #df['TimeConsumed']=df['col2'].apply(lambda x: (datetime.strptime(x,date_format)-datetime.strptime(df['col1'],date_format)).days)
    df["TimeConsumed"] = df.apply(substract, axis=1)
    print df

Output:

        col1        col2  TimeConsumed
0  2017-02-01  2017-03-03            30
1  2017-02-22  2017-03-06            12
linpingta
  • 2,324
  • 2
  • 18
  • 36
  • just tested, it works. but im wondering why "datetime.strptime(x,date_format)" in my code is applying to series, i thought in the lambda function, it applies to each cell of that column. am i wrong about this ? – ikel Mar 14 '17 at 03:50
  • and in your code there are also "datetime.strptime(df['col2'],date_format)" same as mine, im confused, how is the same code works for you but not me – ikel Mar 14 '17 at 04:00
  • when called df.apply(func), actually that func will work on each row, which means df['col1'] and df['col2'] inside func are string other than series. For your code, it's df['col2'].apply, so that works on each row in df['col2'] (Series), while df['col1'] inside it is still Series other than string. – linpingta Mar 14 '17 at 04:25
  • so that mean "datetime.strptime(df['col1'],date_format)" should work outside of that lambda function, right? – ikel Mar 14 '17 at 04:28
  • no, datetime.strpttime only accepts string as input. The reason why it works in my code is because we use apply on that function, from doc http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html, you could see apply is used to call func on each row, which means actuall df['col1'] inside the func is string other than Series :) – linpingta Mar 14 '17 at 05:23
  • isnt the part inside lambda a function too? how do I apply a function with lambda then in this case? thanks a lot – ikel Mar 14 '17 at 17:50
  • no, it's not related with lambda directly, but with who do apply. For your origin code, you do apply on df['col2'], which means func will be called on each row on df['col2'], which not related with col1. For my answer, I do apply on df, which means func call on each row on both col1 and col2 – linpingta Mar 15 '17 at 01:52