3

I have a dataframe:

col_1
Agent AB 7:00 AM
Agent AB 7:00 AM
Cust XY 8:00 AM
Cust XY 9:00 AM
Agent AB 11:00 AM

I want to split it into 2 columns such that the time gets split into a new column.

Expected Output:

col_1        col_2
Agent AB     7:00 AM
Agent AB     7:00 AM
Cust XY      8:00 AM
Cust XY      9:00 AM
Agent AB     11:00 AM

I researched and found out that this can be done using: string slicing.

Something like:

df['col_2'] = df['col_1'].str[-8:-1]

Is there a better way to do it??

Shubham R
  • 7,382
  • 18
  • 53
  • 119

2 Answers2

6
df["col_1"].str.extract(r"^(\D+)(.+)$").rename(columns={0: "col_1", 1: "col_2"})

gives

       col_1     col_2
0  Agent AB    7:00 AM
1  Agent AB    7:00 AM
2   Cust XY    8:00 AM
3   Cust XY    9:00 AM
4  Agent AB   11:00 AM

Regex is looking for consecutive non-digits (\D+) and then the first digit on is captured with (.+). We then rename columns.

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
4

With your shown samples, could you please try following.

import pandas as pd
df["col_1"].str.extract(r"^(.*?)\s+(\d{1,2}:\d{1,2} [AP]M)$").rename(columns={0: "col_1", 1: "col_2"})

Online demo for above regex

Explanation: Adding detailed explanation for above regex.

^(.*?)                      ##Creating 1st capturing group, matching from starting of value and doing a non-greedy match(till followed by spaces 1 or more occurrences).
\s+                         ##Mentioning spaces 1 or more spaces here.
(\d{1,2}:\d{1,2} [AP]M)$    ##Creating 2nd capturing group, matching digits 1 or 2 numbers followed by : matching 1 or 2 digits followed by space and AM/PM.

Output with shown samples is coming as follows:

      col_1    col_2
0  Agent AB  7:00 AM
1  Agent AB  7:00 AM
2   Cust XY  8:00 AM
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93