Formatting DataFrame in python by removing all text after time

Question

I have a dataframe in python in which I would like to remove everything after the last number of the time given.

Here is the head():

  Time
0                      12:00 am (Begins at 14:00)
1                              10:00 - 16:00(EDT)
2                                1:00 - 3:00(CDT)
3                                     1:00 - 7:00
4                                    9:00 - 10:00

I would like the output to be

  Time
0                                   12:00
1                                   10:00 - 16:00
2                                   1:00 - 3:00
3                                   1:00 - 7:00
4                                   9:00 - 10:00

Thanks!

possible duplicate https://stackoverflow.com/questions/13682044/pandas-dataframe-remove-unwanted-parts-from-strings-in-a-column — julian salas, Jul 21 '17 at 00:13
@juliansalas not quite as I'm not sure how to adapt that to my code — z12332, Jul 21 '17 at 00:27
Possible duplicate of [Pandas DataFrame: remove unwanted parts from strings in a column](https://stackoverflow.com/questions/13682044/pandas-dataframe-remove-unwanted-parts-from-strings-in-a-column) — DYZ, Jul 21 '17 at 00:58

score 0 · Answer 1 · answered Jul 21 '17 at 01:35

Here you go. I used a regex to find the first character that wasn't a dash, digit, colon, or white space and then used that index to make a substring and replace the values in the df with that substring after stripping out any extra white space at the end.

from pandas import DataFrame
import re

df = DataFrame(columns = ["time"])
df.loc[0] =  "12:00 am (Begins at 14:00)"
df.loc[1] =  "10:00 - 16:00 (EDT)"
df.loc[2] =  "1:00 - 3:00 (CDT)"
df.loc[3] =  "1:00 - 7:00"
df.loc[4] =  "9:00 - 10:00"

for i in range(len(df)):
    t = df.loc[i]["time"]
    m = re.search("[^\d:\s-]", t)
    if m:
        df.loc[i]["time"] = t[:m.start()].strip()

Output:

            time
0          12:00
1  10:00 - 16:00
2    1:00 - 3:00
3    1:00 - 7:00
4   9:00 - 10:00

score 0 · Answer 2 · answered Jul 21 '17 at 01:52

It is a two step process given your example (You could write it as one function):

import pandas as pd
import re # regex module

df = pd.DataFrame({'Time': ['12:00 am (Begins at 14:00)',
                            '10:00 - 16:00(EDT)',
                            '1:00 - 3:00(CDT)',
                            '1:00 - 7:00',
                            '9:00 - 10:00']})



# Remove everything in between a pattern using grep and strip white spaces

df['Time'] = df['Time'].str.replace(r'\([^)]*\)','')

# Remove other strings that you don't want i.e 'am'
df['Time'] = df['Time'].str.replace(r'([^\d:\-\ ])','')

Formatting DataFrame in python by removing all text after time

2 Answers2