0

I am trying to use lambda and regex to extract text from a string in pandas dataframe, I have regex right and can fill a new column with the right data, but it is surrounded by [ ]?

Code to build dataframe:

carTypes = {'Car Class Description' : ['A - ECAR - Economy',
            'C - ICAR - Intermediate',
            'D - DCAR - Full Size',
            'E - FFAR - Large SUV - 5 Seater',
            'E1 - GFAR - Large SUV - 7 Seater']}

df_carTypes = pd.DataFrame(carTypes)

Code to apply regex to each row in dataframe and generate and populate a new column with result:

df_carTypes['Car Class Code'] = df_carTypes['Car Class Description'].apply(lambda x: re.findall(r'^\w{1,2}',x))

Result:

I get a new column as required with the right result, but [ ] surrounding the output, e.g. [A]

Can someone assist?

Sorry I can't format better...

PacketLoss
  • 5,561
  • 1
  • 9
  • 27
Ianh
  • 5
  • 1
  • 5

2 Answers2

0

This is due to the result of re.findall() returning a list. You can use re.search() and .group() to return a string result.

df_carTypes['Car Class Code'] = df_carTypes['Car Class Description'].apply(lambda x: re.search(r'^\w{1,2}',x).group())

Result:

              Car Class Description Car Class Code
0                A - ECAR - Economy              A
1           C - ICAR - Intermediate              C
2              D - DCAR - Full Size              D
3   E - FFAR - Large SUV - 5 Seater              E
4  E1 - GFAR - Large SUV - 7 Seater             E1
PacketLoss
  • 5,561
  • 1
  • 9
  • 27
0

Because re.findall returns a list of string, whose stringification contains the square bracket.

A tip for using pandas: reduce your use of apply and explore more into the built-in functions, they are both convenient and fast. Here's one way to do it with str.extract:

df_carTypes['Car Class Code'] = df_carTypes['Car Class Description'].str.extract('^(\w{1,2})')
Code Different
  • 90,614
  • 16
  • 144
  • 163