0

There are lots of special characters in these text file I get and I am trying to find and replace them with other strings. For example:

s = "BRAND~*HP*5250*RAM128*GPUAMD7990*~"

df = re.findall('BRAND~*(.+?)*~', s)

print(df)

The result yield ['*']

I want the result to be HP*5250*RAM128*GPUAMD7990

How do I achieve this?

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Maymay
  • 11
  • 1

4 Answers4

2

Escape the * in the pattern:

s = "BRAND~*HP*5250*RAM128*GPUAMD7990*~"

out = re.findall('BRAND~\*(.+?)\*~', s)

Output:

['HP*5250*RAM128*GPUAMD7990']
mozway
  • 194,879
  • 13
  • 39
  • 75
2

You need to escape the '*'s:

df = re.findall('BRAND~\*(.+?)\*~', str)
JustLearning
  • 1,435
  • 1
  • 1
  • 9
1

You want to

  1. match the * asterisk as a normal 'character, and
  2. strip the remaining *

PS: I see your question has been edited. You may discard the 'strip' part. It seems you did not need that anymore.

[match / find]

import re

## find match based on regex pattern
df = re.findall('BRAND~\*(.+?)\*~', str)
df

[out] ['HP*5250*RAM128*GPUAMD7990']

[strip / remove]

r = re.compile('(\*)')
[r.sub('', w) for w in df]

out ['HP5250RAM128GPUAMD7990']


PS: Others might assist with a more python code!

PS: You may want to take note of Find and replace string values in list [duplicate]

semmyk-research
  • 333
  • 1
  • 9
  • 1
    Hi semmyk-research. Your post is right on. Just a small warning. Forcing a match between delimiters using `.+?` could consume a delimiter if the content were empty. The adverse affect is to go to the next valid delimiter it finds. – sln Jan 23 '23 at 21:09
  • @sln Thanks for pointing this out. #Appreciated. Some of these things, one gets to overlook easily at ones detriment (down the line). Apparently, `.+` gets even greedy~ – semmyk-research Jan 24 '23 at 10:37
  • Yeah so `.*?` single character checks for the end delimiter might be better always in this cases. – sln Jan 27 '23 at 21:15
1

A pair of lookarounds will get the center.

(?<=BRAND~\*).*?(?=\*~)

https://regex101.com/r/zdkmYD/1

But to replace specific characters within that match requires a callback function.

sln
  • 2,071
  • 1
  • 3
  • 11