0

I have a data frame which is of the stop and search of the police in London; the 'date' column has entries like the 2019-10-01T04:31:39+00:00 2019-10-01T04:31:39+.. in each row I'm trying to extract day month and year to be able to begin analysis.

Could anyone please assist me in doing this??

  • Are all the dates in the format "YYYY-MM-DD" followed by timestamp? – Harsh Dec 07 '20 at 02:51
  • they are actually messed up each input is like this `'2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04:39:27+00:00 2019-10-01T04....` repeating itself, I would first need to cut everything out – KappamanOosh Dec 07 '20 at 02:54
  • can you post a sample of the dataframe? the text from df.to_dict() (or df.head(20).to_dict()`?) – anon01 Dec 07 '20 at 02:59
  • Each input has a string of repeating dates n times? – Harsh Dec 07 '20 at 02:59
  • I've now refresh and it isn't repeated here is a print of the the dict() of my dataframe; `'Date': {0: '2019-10-01T04:31:39+00:00', 1: '2019-10-01T04:39:27+00:00', 2: '2019-10-01T04:48:13+00:00', 3: '2019-10-01T11:21:50+00:00', 4: '2019-10-02T01:02:26+00:00', 6: '2019-10-02T03:06:24+00:00', 7: '2019-10-02T04:17:26+00:00', 9: '2019-10-02T10:24:53+00:00', 10: '2019-10-03T01:31:12+00:00',` – KappamanOosh Dec 07 '20 at 03:34

2 Answers2

0
df = df[0].str.split(expand=True)
df = df.astype("datetime64").rename(columns={0:"start", 1:"end"})
df = df.astype("datetime64").rename(columns={0:"start", 1:"end"})

df[["start_day", "start_month", "start_year"]] = df.start.transform(lambda x: pd.Series([x.day, x.month, x.year]))
df[["end_day", "end_month", "end_year"]] = df.end.transform(lambda x: pd.Series([x.day, x.month, x.year]))

output:

                start                 end  start_day  start_month  start_year  end_start  end_month  end_year
0 2019-10-01 04:31:39 2019-10-01 04:31:39          1           10        2019          1         10      2019
anon01
  • 10,618
  • 8
  • 35
  • 58
0

Take a look at how datetime strings are converted to date (or date time) objects in python. For your scenario apply a function to the whole column of date strings that you reading from source.

def convert_str_to_date(date_string):
    converted_date = datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S%z")
    return converted_date.month, converted_date.day, converted_date.year


>>> df
         name_of_date_column
0  2019-10-01T04:31:39+00:00
1  2019-10-01T04:31:39+00:00

df.apply(lambda x: convert_str_to_date(x['name_of_date_column']), axis=1)
# 0    (10, 1, 2019)
# 1    (10, 1, 2019)
# dtype: object

The first line of this function is doing the work of converting string to respective elements of datetime.

  • note: there's [fromisoformat](https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat), see e.g. https://stackoverflow.com/questions/13468126/a-faster-strptime - but pd.to_datetime likely is more convenient here. – FObersteiner Dec 07 '20 at 08:38
  • While that is true, the value in date-string can appear in a non-dataframe scenario as well. Hence, a more python _ish_ hack – dragonfire_007 Dec 07 '20 at 08:41
  • then I'd use `datetime.fromisoformat(s).timetuple()[:3]`, giving me a y-m-d tuple – FObersteiner Dec 07 '20 at 08:45
  • 1
    That was a learning for me. Changing few things in my code now. Thanks !! – dragonfire_007 Dec 07 '20 at 09:19