1

I am learning and trying regex on a string.

which is "DBZ:00000*{6000}/ONE/REFFERRARO REF:FINE DOGS*"

I am trying to find all the REF in this string. So I used this:

import re

doom = 'REF'
boom = "DBZ:00000*{6000}/ONE/REFFERRARO REF:FINE DOGS*"

# print(i)
# print('Found "%s" in "%s" ->' % (i, boom), end='')
print(re.findall(r"\b" + doom + "*", boom))
if re.search(doom, boom):
    print("found")

Output:

['REFF','REF']

I am not here getting exact REF and also what I want to do is check if there is any character next to "REF"

Like: "REFFERRARO" -> Here next to "REF" is "F" "REF:FINE" -> Here next to "REF" is ":"

So If I find next to "REF" anything except ":" I want to add ":" after "REF".

Example:

String: "DBZ:00000*{6000}/ONE/REFFERRARO REF:FINE DOGS*"
Output: "DBZ:00000*{6000}/ONE/REF:FERRARO REF:FINE DOGS*"

UPDATE:

As said I used .sub and got this:

print(re.compile('REF').sub("REF:", boom))

Output:

"DBZ:00000*{6000}/ONE/REF:FERRARO REF::FINE DOGS*"

New UPDATE:

Tried this and it worked (But I don't think this is valid because if there are n number of "REFFERRARO" then):

print(re.compile('REF').sub("REF:", boom,count=1))
El_Dorado
  • 193
  • 2
  • 12

3 Answers3

1

Your pattern of '\bREF*' looks for a word boundary followed by 'REF' where 'F' is qualified to be there 0 to n times. Thats what you get: 'REF' and 'REFF'

You probaby want r'\bREF.*\b'.

To substitute your : do:

import re


pattern = r'\bREF([^:])' # REF followed by NOT a : - capture the single char

# \1 inserts the single char after REF without : again
correct = re.sub(pattern, r'REF:\1', "DBZ:00000*{6000}/ONE/REFFERRARO REF:FINE DOGS*")

print(correct)  

Output:

DBZ:00000*{6000}/ONE/REF:FERRARO REF:FINE DOGS*
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Tried this `print(re.findall(r"\b" + doom + ".*\b", boom))` output `[]` I want to get output like this: `"DBZ:00000*{6000}/ONE/REF:FERRARO REF:FINE DOGS*"` – El_Dorado Dec 31 '19 at 13:18
  • @El you need to substitute then - `re.sub` returns a string which replaced what your pattern matches with what you want to replace it with -see edit – Patrick Artner Dec 31 '19 at 13:21
  • @El_Dorado fixed missing char and adapted your example – Patrick Artner Dec 31 '19 at 13:25
  • I tried your method. `correct = re.sub(r'\bREF([^:]),r'REF:\1',"REFFERRARO REF:FINE")` It resulted in syntax error for `r'REF:\1'`. I also tried with the pattern but resulted in an error because I was taking patter values as str so... – El_Dorado Jan 02 '20 at 07:42
  • @El My code here missed a `'` - added it. – Patrick Artner Jan 02 '20 at 08:36
  • It worked Thank you. Your answer is simple as "Hello world". – El_Dorado Jan 02 '20 at 09:29
1

Here is a way to go with lookaround:

import re

str = "DBZ:00000*{6000}/ONE/REFFERRARO REF:FINE DOGS*"
print(re.sub(r'(?<=\bREF)(?!:)', ':', str))

Explanation:

(?<=\bREF)  # positive lookbehind, make sure we have REF before
(?!:)       # negative lookahead, make sure we haven't colon after

Demo & explanation

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Toto
  • 89,455
  • 62
  • 89
  • 125
  • It worked Awesomely. Thank you. Is there any easy to understand tutorial with multiple examples to learn more about regex? – El_Dorado Jan 02 '20 at 07:46
  • @El_Dorado: Here is a good start: https://www.regular-expressions.info/ – Toto Jan 02 '20 at 10:36
0

First to fix your current regex \bREF*
You are applying the * quantifier (match between zero and unlimited times) only to the letter F.
I'm assuming you actually want it for the whole word, so you'd do \b(REF)*, or maybe even \b(?:REF)*.
?: indicates that your () group isn't a capturing group. If you don't yet know what those are, you can pretty much forget about this. It wont make any difference in your case, just something for your future endeavors.
Also, might want to use + (match one or more times) instead of *.

And now about how you'd check if there's something next to the last F.
You could use a positive lookahead for example \b(REF)+(?=:).

Be sure to check out this amazing website to try out different regexes:
https://regex101.com/

0x464e
  • 5,948
  • 1
  • 12
  • 17
  • As you said I tried that site and tried this: `r"REF[^:]"` got output: `[REFF]` Now how to replace that with "REF:FERRARO". And if there is something like this `NONREF` then it shouldn't replace it. – El_Dorado Dec 31 '19 at 13:56