0

For example I have a Pandas DataFrame with a string column in which I would like to delete the **bold** text before a substring:

Column1
**Yon-RM-**CT 500M
**Abib-RM-**CT 500M
**Wal-RM-**CT 500M
**Sopxc-RM-**CT 1000M

Notice that the bold text could have different length but the substring ends in “-RM-“.

smci
  • 32,567
  • 20
  • 113
  • 146
JOOC
  • 1
  • 1
  • 2
    Welcome JOOC. What have you tried? – sfjac Dec 21 '21 at 21:36
  • 2
    Please also provide an example of what you expect the result to look like, it's not clear from your description. What have you tried yourself, what problems did you run into? https://stackoverflow.com/help/how-to-ask – Grismar Dec 21 '21 at 21:37
  • This is a pandas regex question. Please make sure to tag [tag:pandas]. Also, there are many duplicates, please search for them. – smci Dec 21 '21 at 22:10
  • [**`df['Column1'].str.replace(pat, repl, ...)`**](https://stackoverflow.com/questions/28986489/how-to-replace-text-in-a-string-column-of-a-pandas-dataframe) , see that duplicate question. The rest is just finding the specific regex for your case. – smci Dec 22 '21 at 07:36

3 Answers3

0

Assuming all you want is CT 500M, and all follow the same format, apply a lambda function that splits by "-", and get the third index

 df["Column1"] = df.apply(lambda x: x["Column1"].split("-")[2], axis=1)

You could also split by "RM"

gotenks
  • 123
  • 6
0

Use the re.sub() method from the re module to replace the string you don't want with ''. Apply it to the column. Something like this should work.

for i in Column1:
   i = re.sub('^\*.*\*', '', i)

or

Column1 = [re.sub('^\*.*\*', '', i) for i in Column1]

^\*.*\* basically finds all characters between a starting * and the last *. Re.sub() finds each one and substitutes it with whatever you choose.

Here's the documentation

Parker.V
  • 94
  • 8
0

Assuming you want to remove everything between double asterisks, use Series.str.replace with a regex ('\*\*.*?\*\*'):

df['Column1'] = df['Column1'].str.replace('\*\*.*?\*\*', '', regex=True)

Output:

    Column1
0   CT 500M
1   CT 500M
2   CT 500M
3  CT 1000M
mozway
  • 194,879
  • 13
  • 39
  • 75