0

I have the following string example:

'NAME: "test1",  DESCR: "AAA 1111S ABC 48 BB (4 BBBB) TEST1 "'

i am trying to extract out "AAA 1111S" but am struggling to know what i am doing wrong in the regex?

the regex i am using is below, i thought it would work based on the first space then 2nd space plus remaining chars in string.

^.+(AAA\s.+)\s.+"$

but it will only pull out the following:

AAA 1111S ABC 48 BB (4 BBBB) TEST1

john johnson
  • 699
  • 1
  • 12
  • 34

2 Answers2

0

In your regex you use (AAA\s.+) where the .+ will match any character 1+ times. That will match until the end of the string. The part that follows \s.+"$ will match TEST1 "

You could use a positive lookbehind (?<=") to assert what is on the left is a double quote. Then match AAA followed by 1+ times a whitespace character \s+ and 1+ times not a whitespace character \S+.

(?<=")AAA\s+\S+

Regex demo

If you want to keep the anchor ^ and match from the first occurrence of AAA, you could use .+? which will match any characters 1+ times non greedy.

^.+?(AAA\s+\S+)

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I think you can have the same result without the lookbehind. – Alex G Oct 20 '18 at 14:08
  • @AlexG That is true but in the example data there is another part that has 3 upper case characters. So I have used the positive lookbehind to be more specific where to match AAA. – The fourth bird Oct 20 '18 at 14:12
0
>>> import re
>>> string = 'NAME: "test1",  DESCR: "AAA 1111S ABC 48 BB (4 BBBB) TEST1
>>> sol = re.findall('\w{3}\s\w{5}',string) 
>>> sol
['AAA 1111S']
1UC1F3R616
  • 453
  • 5
  • 10
  • 1
    \w is used for alphanumeric and i used {m} to give exact number of alphanumeric. Is that what you wanted? :) Please don't down-vote i did what I understand from your question language. If you wanted something else do comment. – 1UC1F3R616 Oct 20 '18 at 14:14
  • you should write an explanation to your code. Just code answer's are not suggested. – rawwar Oct 20 '18 at 14:49
  • so 1111S could be 111S or ABC , it could change over time but there will always be a space between 111S and ABC even if it was say "222S XYZ2" and also a space between AAA and 1111S – john johnson Oct 20 '18 at 15:46