-1

I have a dataframe containing a datetime column ('created_at') of type : series. I need to convert this column into datetime format in order to perform some groupby operations on it.

Here is the dataframe structure:

           id                created_at                                          full_text 
0  1286763394658381824  ['2020-07-24T20:41:14Z']  عدم إصابتك بفايروس كورونا حتى الان مؤشر لأمرين
1  1240341918967459840  ['2020-03-18T18:18:52Z']  رسالة مسربة من داخل #سجن_العقرب تؤكد على تفشى 
2  1243387711995572224  ['2020-03-27T04:01:46Z']  في الافلام الاجنبيه نشاهد امريكا تقود العالم ل
3  1317384182012792832  ['2020-10-17T08:37:19Z']  هناك الكثير من الاكاذيب والفبركات حول لقاح كور
4  1317404463859142656  ['2020-10-17T09:57:55Z']  @kasimf لقاح كورونا ليس هدفه الإضرار بالبشر إن
5  1242851102258868224  ['2020-03-25T16:29:28Z']  بعد تفشي المرض في إيطاليا ولا وجود علاج قرر جم

I tried different ways to convert 'create_at' column into datetime format but noneone worked. Here is an example :

    from dateutil.parser import parse
    df['date'] =parse(df['created_at'].astype(str))

This give me the following error:

    raise TypeError('Parser must be a string or character stream, not '
TypeError: Parser must be a string or character stream, not Series

Edit

I figured out this by doing :

from dateutil import parser
def convert_date(date_str):
        return parser.parse(date_str) 
        
df['date']= df['created_at'][0][2:-2]    
df['date'] = df['date'].apply(convert_date)

Thanks to you all

Youcef
  • 1,103
  • 2
  • 11
  • 26
  • use pd.to_datetime; `df['date'] = pd.to_datetime(df['created_at'])` – FObersteiner Feb 22 '22 at 12:40
  • Thanks for suggestion. This gave me the following error : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: ['2020-07-24T20:41:14Z'] – Youcef Feb 22 '22 at 12:42
  • you can also define the format: `pd.to_datetime(df['created_at'], format="%Y-%m-%dT%H:%M:%S%z)`. – FObersteiner Feb 22 '22 at 12:43
  • ...or take first element from the lists: `pd.to_datetime(df['created_at'].str[0])` – FObersteiner Feb 22 '22 at 12:44
  • The second suggestion also didn't work : ValueError: time data '['2020-07-24T20:41:14Z']' does not match format '%Y-%m-%dT%H:%M:%S%Z' (match) – Youcef Feb 22 '22 at 12:45
  • And the third one gave me this : in parse raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: [ – Youcef Feb 22 '22 at 12:46
  • df['date'] = pd.to_datetime(df['created at'], format="['%Y-%M-%DT%H:%M:%SZ']") Looks like the proper format – ShyGuyRyRyNewlyTheDataGuy Feb 22 '22 at 12:48
  • Thanks. This looks more close to the solution. But it gave me this error : ValueError: 'D' is a bad directive in format '['%Y-%M-%DT%H:%M:%SZ']' – Youcef Feb 22 '22 at 12:50
  • the correct format has a lower case z; `%z`, not `Z` and not `%Z` (upper case). – FObersteiner Feb 22 '22 at 12:50
  • [mre]: `pd.to_datetime(pd.Series([['2020-07-24T20:41:14Z'],['2020-03-18T18:18:52Z']]).str[0])` works fine – FObersteiner Feb 22 '22 at 12:52
  • When I try to replace this by the 'created_at' column : df['date']=pd.to_datetime(df['created_at'].str[0]) It gives me this error : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: [ – Youcef Feb 22 '22 at 12:56
  • then unfortunately, the example you show in your question is not reproducible - what is the actual content of df.created_at ? – FObersteiner Feb 22 '22 at 13:07
  • When I do print(df.dtypes) . I get this : id int64 | created_at object | full_text object – Youcef Feb 22 '22 at 13:10
  • The content of created_at is as show in the question, values like : ['2020-03-18T18:18:52Z'] – Youcef Feb 22 '22 at 13:12
  • I can't answer, because the question has been flagged as already answered on another post. Yet, the other post, I don't believe answers this question. The problem here is that each value in created at, is a list with 1 element. So this should do the trick: ```pd.to_datetime(df.created_at.apply(lambda x: x[0]))``` – Lalo Oct 04 '22 at 16:43

1 Answers1

0

I think it is

df['date'] = pd.to_datetime(df['created_at'])
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
finix
  • 19
  • 7
  • from the OPs example, it seems to me that `created_at` is a pandas Series, you your code won't work. – FObersteiner Feb 22 '22 at 12:42
  • Thanks. However it looks like the suggestion contain a syntaxical error. I am getting this :NameError: name 'created_at' is not defined – Youcef Feb 22 '22 at 12:43
  • I think conversion should be something like this: df['date'] =df['created_at'].astype('datetime64[ns]') – ilhank Feb 22 '22 at 12:46
  • no it was ```df['date'] = pd.to_datetime(df['created_at']) ``` – finix Feb 22 '22 at 12:47
  • @ilhank still didn't solve it . It gave me this : dateutil.parser._parser.ParserError: Unknown string format: ['2020-07-24T20:41:14Z'] – Youcef Feb 22 '22 at 12:48
  • @finix Same here. I am getting this : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: ['2020-07-24T20:41:14Z'] – Youcef Feb 22 '22 at 12:49
  • what about this one ? df['date']= pd.to_datetime(df['created_at'], format="['%Y-%M-%DT%H:%M:%SZ']") – ilhank Feb 22 '22 at 12:49
  • @ilhank Same :( . I am getting this : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: ['2020-07-24T20:41:14Z'] – Youcef Feb 22 '22 at 12:51
  • ```pd.to_datetime(pd.Series([['2020-07-24T20:41:14Z'],['2020-03-18T18:18:52Z']]).str[0])``` – finix Feb 22 '22 at 12:52
  • @finix This works. But when I try to replace this by the 'created_at' column : df['date']=pd.to_datetime(df['created_at'].str[0]) It gives me this error : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: [ – Youcef Feb 22 '22 at 12:56
  • can you try this df['date']= pd.to_datetime(df['created_at'],infer_datetime_format=True) – ilhank Feb 22 '22 at 13:01
  • @ilhank It still give me the same error : raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: ['2020-07-24T20:41:14Z'] – Youcef Feb 22 '22 at 13:03
  • This ignores the fact that `created_at` is a columns with a list of strings on each element. – Lalo Oct 04 '22 at 16:45