pandas series string extraction using regular expression : How to exclude certain symbols from the beginning?

Question

I want to extract a data column where each cell is a string type consisting a hotel's room number and occupied packages on a given time. Each cell looks like the following

                          624: COUPLE , 507: DELUXE+ ,301: HONEYMOON

Here's the code snippet I have written to collect all the room numbers occupied and the packages purchased.

import numpy as np
import pandas as pd
d = np.array(['624: COUPLE , 507: DELUXE+ ,301: HONEYMOON','614:FAMILY , 507: FAMILY+'])
df = pd.Series(d)
df= df.str.extractall(r'(?P<room>[0-9]+)(?P<package>[\S][^,]+)')
df

However the output keeps the colon in front of package name. Output of given python code

How do I remove the colon in front of package name in the output ????

score 1 · Accepted Answer · answered Feb 20 '21 at 15:22

1

You can put : and an optional whitespace patterns between the two named capturing groups and use

>>> df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<package>[^\s,]+)')
        room    package
  match                
0 0      624     COUPLE
  1      507    DELUXE+
  2      301  HONEYMOON
1 0      614     FAMILY
  1      507    FAMILY+

See the regex demo. Details:

(?P<room>[0-9]+) - Group "room": one or more digits
:\s* - a colon and then zero or more whitespaces
(?P<package>[^\s,]+) - Group "package": one or more chars other than whitespace and a comma.

answered Feb 20 '21 at 15:22

Wiktor Stribiżew

607,720
39
448
563

Now I need to create dedicated column for each room and each cell under those column might be occupied or available. what would be the efficient way to do so ? For example , a sing column 'room#507' will be added to the original dataset having {DELUZE+ , FAMILY+ ,.....} as values. – BN production Feb 20 '21 at 15:59
@BNproduction If you have a new question, please consider accepting this one and ask a new question – Wiktor Stribiżew Feb 20 '21 at 16:00
please help me on this topic : https://stackoverflow.com/questions/66301681/create-pandas-column-using-cell-values-of-another-multi-indexed-data-frame – BN production Feb 21 '21 at 11:15

pandas series string extraction using regular expression : How to exclude certain symbols from the beginning?

1 Answers1