1

I have this sentence: "int open(const char *" pathname ", int " flags );

I am trying to find a regex to extract the words outside the double quotes. Example: "pathname" and "flags". I created a regex expression, but it only catches the word "flags" and not the word "pathname". Here is what I have:

 reg2 = r"""(\".*\" (.*) )+\);"""
 pattern2 = re.compile(reg2)

 inner = m.group(1)
 m2 = pattern2.search(inner)
 EntityI = m2.group(2)
 print EntityI

Note: m.group(1) is: "int open(const char *" pathname ", int " flags );

Thanks for the help!

Edit: Just the clarify some more. Another possible case could be:

"int open(const char *" pathname ", int " flags ", mode_t " mode );

And I would want to extract the words: "pathname", "flags", and "mode".

lilshadowy
  • 23
  • 5

2 Answers2

2

This is a perfect case for the trash-can-appraoch: forget everything that is not in capture group 1.

".*?"|(\w+)

Explanation: We select from two alternatives |

  • ".?" matches a string from start to end using the quotes as an anchor and anything in-between using the .and the * quantifier that any number of repetitions. The ? changes the behavior of the star to match as few times as possible (lazy) to avoid to match too much with a default greedy match.
  • (\w+) the parenthesis define a capture group that captures one or more + alphanumerics: \w itself is a shorthand character class that stands for [a-zA-Z0-9_] (this is called a character range).

Sample code:

import re
regex = r'".*?"|(\w+)'
test_str = "\"int open(const char *\" pathname \", int \" flags );"
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
    if match.group(1):
        print ("Found at {start}-{end}: {group}".format(start = match.start(1), end = match.end(1), group = match.group(1)))

Output:

Found at 24-32: pathname
Found at 42-47: flags
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Hi, thanks for the help! But I am trying to refrain from changing the original string. – lilshadowy Jun 06 '18 at 01:34
  • @lilshadowy I cannot follow. The original string is not changed. – wp78de Jun 06 '18 at 01:48
  • Nvm, sorry about the confusion, I saw the backslashes and thought you were altering the original string.Your code works! Just several quick questions: what are the backslashes for? Also, can you kind of explain to me why the regex works? I am new to this stuff. Thanks! – lilshadowy Jun 06 '18 at 02:02
  • @lilshadowy yaa, the original string is escaped in an odd way. The backslashes are used to escape special characters in the string., eg. when both types of quotes are used in a string. – wp78de Jun 06 '18 at 02:09
  • Ohh, I see. Thanks! – lilshadowy Jun 06 '18 at 02:30
0

Here's one way that replaces things inside quotes and then splits the resulting string. You'll probably want to do more processing since as noted the ); is also outside the quotes.

import re
my_string = '"int open(const char *" pathname ", int " flags );'
re.sub('".*?"', '_', my_string).split('_')[1:]
## [' pathname ', ' flags );']
Calum You
  • 14,687
  • 4
  • 23
  • 42