Split pandas column into two by a number(containing time)

Question

I have a dataframe:

col_1
Agent AB 7:00 AM
Agent AB 7:00 AM
Cust XY 8:00 AM
Cust XY 9:00 AM
Agent AB 11:00 AM

I want to split it into 2 columns such that the time gets split into a new column.

Expected Output:

col_1        col_2
Agent AB     7:00 AM
Agent AB     7:00 AM
Cust XY      8:00 AM
Cust XY      9:00 AM
Agent AB     11:00 AM

I researched and found out that this can be done using: string slicing.

Something like:

df['col_2'] = df['col_1'].str[-8:-1]

Is there a better way to do it??

Does this answer your question? [Extracting date from a string in Python](https://stackoverflow.com/questions/3276180/extracting-date-from-a-string-in-python) — sushanth, Apr 26 '21 at 08:45

score 6 · Answer 1 · answered Apr 26 '21 at 08:48

df["col_1"].str.extract(r"^(\D+)(.+)$").rename(columns={0: "col_1", 1: "col_2"})

gives

       col_1     col_2
0  Agent AB    7:00 AM
1  Agent AB    7:00 AM
2   Cust XY    8:00 AM
3   Cust XY    9:00 AM
4  Agent AB   11:00 AM

Regex is looking for consecutive non-digits (\D+) and then the first digit on is captured with (.+). We then rename columns.

RavinderSingh13 · Accepted Answer · 2021-04-26T09:50:42.807

4

With your shown samples, could you please try following.

import pandas as pd
df["col_1"].str.extract(r"^(.*?)\s+(\d{1,2}:\d{1,2} [AP]M)$").rename(columns={0: "col_1", 1: "col_2"})

Online demo for above regex

Explanation: Adding detailed explanation for above regex.

^(.*?)                      ##Creating 1st capturing group, matching from starting of value and doing a non-greedy match(till followed by spaces 1 or more occurrences).
\s+                         ##Mentioning spaces 1 or more spaces here.
(\d{1,2}:\d{1,2} [AP]M)$    ##Creating 2nd capturing group, matching digits 1 or 2 numbers followed by : matching 1 or 2 digits followed by space and AM/PM.

Output with shown samples is coming as follows:

      col_1    col_2
0  Agent AB  7:00 AM
1  Agent AB  7:00 AM
2   Cust XY  8:00 AM

edited Apr 26 '21 at 09:50

answered Apr 26 '21 at 09:01

RavinderSingh13

130,504
14
57
93

1

much more robust! – Mustafa Aydın Apr 26 '21 at 09:07
@RavinderSingh13 It's giving me NaN's in both the columns :( – Shubham R Apr 26 '21 at 09:41
@ShubhamR, oh, is your actual df is same as shown samples only? Or its different from shown one, kindly do let me know. – RavinderSingh13 Apr 26 '21 at 09:42
@RavinderSingh13 Yeah, it was a sample dataset. The code given by Mustafa works fine on the dataset though – Shubham R Apr 26 '21 at 09:43
@ShubhamR, with your shown dataset this has worked fine for me, not sure but regex looks for same kind of values which you have shown in inputs samples. – RavinderSingh13 Apr 26 '21 at 09:44
@RavinderSingh13 Yeah, I looked at the same, don't know why this doesn't work. Instead of "Agent AB" replace with "MANAU BADA 6:58 AM " and "Cust XY" with "WONG AI NEIG 6:58 AM " – Shubham R Apr 26 '21 at 09:47
@RavinderSingh13 and with different time. One more thing to observe is the trailing whitespaces – Shubham R Apr 26 '21 at 09:48
@ShubhamR, could you please do give one line for which it isn't working, I need to check it. – RavinderSingh13 Apr 26 '21 at 09:49
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231605/discussion-between-shubham-r-and-ravindersingh13). – Shubham R Apr 26 '21 at 09:49
@ShubhamR, Here is [regex demo for your 2nd shown samples](https://regex101.com/r/b3v9k5/1) which shows my solution's regex worked fine for me. – RavinderSingh13 Apr 26 '21 at 09:54

Split pandas column into two by a number(containing time)

2 Answers2