Remove specific combination of characters in dataframe colum?

Question

I have the following issue where I have some data that has a specific combination of characters that I need to remove, example:

data_col
*.test1.934n
test1.tedsdh
*.test1.test.sdfsdf
jhsdakn
*.test2.test

What I need to remove is all the instances that exist for the "*." character combination in the dataframe. So far I've tried:

df['data_col'].str.replace('^*.','')

However when I run the code it gives me this error:

re.error: nothing to repeat at position 1

Any advise on how to fix this? Thanks in advance.

Ilya V. Schurov · Accepted Answer · 2022-05-04T09:44:57.847

3

The default behaviour of .str.replace in pandas version 1.4.2 or earlier is to treat the replacememnt pattern as a regular expression. If you are using regular expressions to match characters with special meaning like * and . you have to escape them with backslashes:

df['data_col'].str.replace(r'^\*\.', '', regex=True)

Note that I used raw string literals to make sure that backslashes are treated as is. I also added regex=True, because otherwise pandas complains that in future it will not treat patterns as regex. Due to ^ at the beginning, this regex will only match the beginning of each string.

However, it is also possible that you don't need regular expressions in this particular case at all.

If you want to remove any instance of *. in your strings (not only the beginning ones), you can just do it with

df['data_col'].str.replace('*.', '', regex=False)

If you want to remove instance of *. only at the beginning of the string, you can use .removeprefix instead:

df['data_col'].str.removeprefix('*.')

edited May 04 '22 at 09:44

answered May 03 '22 at 21:48

Ilya V. Schurov

7,687
2
40
78

Maybe they also need to remove the `^` if it is **all** instances and not just those at the beginning – Adelin May 03 '22 at 21:51
@Adelin, thanks, I updated answer to take into account this possibility. – Ilya V. Schurov May 04 '22 at 09:45
yeah that did the trick, whenever a new character comes in do I need to separate by adding a slash to it? (sorry for my bad English) – silentninja89 May 04 '22 at 18:47
@silentninja89 depends on what character you add: some characters (like digits or letters) are treated as is, some have special meaning. You can learn more about regular expressions e.g. on http://regex101.com. But I want to reiterate that you probably don't need regular expressions at all for this problem, as you are just removing some substrings, not more complex patterns. – Ilya V. Schurov May 04 '22 at 21:10

Remove specific combination of characters in dataframe colum?

1 Answers1