Regex to extract 2 lists of connected words

Question

I want to extract 2 lists of words that are connected by the sign =. The regex code works for separate lists but not in combination.

Example string: bla word1="word2" blabla abc="xyz" bla bla

One output shall contain the words directly left of =, i.e. word1, abc and the other output shall contain the words directly right of =, i.e. word2, xyz without quotes.

\w+(?==\"(?:(?!\").)*\") extracts the words left of =, i.e. word1,abc

=\"(?:(?!\").)*\" extracts the words right of = including quotes and =, i.e. ="word2",="xyz"

How can I combine these 2 queries to a single regex-expression that outputs 2 groups? Quotes and equal signs shall not be outputted.

score 2 · Answer 1 · answered Nov 14 '21 at 20:10

If you are looking for lhs and rhs from lhs="rhs" this should work (Sorry this what I understood from your question)

import re
test_str='abc="def" ghi'
ans=re.search("(\w+)=\"(\w+)\"",test_str)
print(ans.group(1))
print(ans.group(2))
my_list=list(ans.groups())
print(my_list)

score 2 · Accepted Answer · answered Nov 14 '21 at 20:26

You can use

([^\s=]+)="([^"]*)"

See the regex demo. Details:

([^\s=]+) - Group 1: one or more occurrences of a char other than whitespace and = char
=" - a =" substring
([^"]*) - Group 1: zero or more chars other than " char
" - a " char.

Note: \w+ only matches one or more letters, digits and underscores, and won't match if the keys contain, say, hyphens. (?:(?!\").)* tempered greedy token is not efficient, and does not match line break chars. As the negative lookahead only contains a single char pattern (\.), it is more efficient to write it as a negated character class, [^.]*. It also matches line break chars. If you do not want that behavior, just add the \r\n into the negated character class.

score 1 · Answer 3 · answered Nov 14 '21 at 20:07

1

This should do what you want:

(?: (\w*)=)(?:\"(\w*)\")

This is for a python regex.

You can see it working here.

answered Nov 14 '21 at 20:07

0xd34dc0de

493
4
10

Regex to extract 2 lists of connected words

3 Answers3