How to extract first two characters from string using regex

Question

reference: Pandas DataFrame: remove unwanted parts from strings in a column

In reference to an answer provided in the link above. I've researched some regular expressions and I plan to dive deeper but in the meantime I could use some help.

My dataframe is something like:

df:

  c_contofficeID
0           0109
1           0109
2           3434
3         123434  
4         1255N9
5           0109
6         123434
7           55N9
8           5599
9           0109

Psuedo Code

If the first two characters are a 12 remove them. Or alternatively, add a 12 to the characters that don't have a 12 in the first two characters.

Result would look like:

  c_contofficeID
0           0109
1           0109
2           3434
3           3434  
4           55N9
5           0109
6           3434
7           55N9
8           5599
9           0109

I'm using the answer from the link above as a starting point:

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')

I've tried the following:

Attempt 1)

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'[1][2]',value=r'')

Attempt 2)

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'$[1][2]',value=r'')

Attempt 3)

df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'?[1]?[2]',value=r'')

What if you have "1234"? Should "12" be retained in that case or discarded? — Nathan Davis, Oct 26 '16 at 23:05

piRSquared · Accepted Answer · 2016-10-26T23:46:44.910

2

new answers
per comment from @Addison

# '12(?=.{4}$)' makes sure we have a 12 followed by exactly 4 something elses
df.c_contofficeID.str.replace('^12(?=.{4}$)', '')

If ID's must have four characters, it's simpler to

df.c_contofficeID.str[-4:]

old answer
use str.replace

df.c_contofficeID.str.replace('^12', '').to_frame()

edited Oct 26 '16 at 23:46

answered Oct 26 '16 at 22:51

piRSquared

285,575
57
475
624

1

This is dangerous, as it won't work for `1234`. Please use something like `^12(?=.{4}$)` – Addison Oct 26 '16 at 23:35

How to extract first two characters from string using regex

1 Answers1