-1

I have been working with Lex, and encountered a problem of finding comments in a C program and removing them in the final output. To accomplish this I need to identify any occurrence of */ (this is how a traditional multiline comment ends!).

Then my problem is reduced to a subproblem of accepting any sequence of characters other than */ . I tried out a several ways for accomplishing this. I tried out: [^*\/], and reasonably it did not work. Any help or suggestion is appreciable.

Brian
  • 3,850
  • 3
  • 21
  • 37
Bishwajit Purkaystha
  • 1,975
  • 7
  • 22
  • 30
  • `[^*\/]` is a non `*` or `/`. Does lex use PCRE, if so I think `\*\/(*SKIP)(*FAIL)|.` would work.. (also if `/` is not a delimiter it doesn't need to be escaped) – chris85 Jan 10 '17 at 16:00
  • @Bishwajit Purkaystha Checkout the Regex101.com links. That site as an explanation of the Regex on the right that will do a far better job explaining the regex then I can. – MattSizzle Jan 10 '17 at 16:18

2 Answers2

1

You need to match any character sequence followed by */ two-character sequence literally. A positive look-ahead will you to achieve this. For possible multiple */ occurrences you need a positive look-behind or start of the text ((?<=^|\*\/)):

(?<=^|\*\/).*?(?=\*\/)

Demo: https://regex101.com/r/y88e0T/2

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • But, this doesn't work on this case. For the case: `is this */ how you do */` I want to find the two segments: [is this] and [how you do]. But your solution gives [is this */ how you do]. Moreover the comments may span multiline. – Bishwajit Purkaystha Jan 10 '17 at 16:11
  • Yes, I saw it Could you direct me where I can learn Regular Expressions effectively? Thanks for answering. – Bishwajit Purkaystha Jan 10 '17 at 16:19
  • @BishwajitPurkaystha, the http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean is good as an intro. http://www.regular-expressions.info/ is a great resource but not quite easy to absorb due to huge amount of the information. Yet I utterly advise you to consider it. The [SO info page](http://stackoverflow.com/tags/regex/info) has quite a number of useful links on the topic. And as MattSizzle already said, check out the Regex101.com links. – Dmitry Egorov Jan 10 '17 at 16:26
  • Thanks @Dmitry Egorov for taking time to write for me! – Bishwajit Purkaystha Jan 10 '17 at 16:29
1

I would use a combination of lookahead and lookbehind to achieve the top-level solution.

Regex101

/(?<=\/\*).*(?=\*\/)/s

This will insure everything (assuming newlines and the s modifier) between /* and */ is captured in a group.

For "What is the regex for any set of characters other than “*/”"

Regex101

That is as simple as:

/[^\/\*]/
MattSizzle
  • 3,145
  • 1
  • 22
  • 42
  • You seem to have swapped the lookahead and the lookbehind. BTW, they're zero-length matches so not need in the capture group - the entire match will give you what is required: https://regex101.com/r/8Zpmhu/2 – Dmitry Egorov Jan 10 '17 at 16:11
  • @Dmitry Egorov You are correct. This is my morning warmup "coffee" question. Thanks. I updated my answer. – MattSizzle Jan 10 '17 at 16:15
  • Hey, @MattSizzle your solution worked. But can you please elaborate on how you did actually build it? I'm really a novice. Thanks. – Bishwajit Purkaystha Jan 10 '17 at 16:15
  • 1
    `[^\/\*]` is not `any set of characters other than “*/”"` it is any character other than `*` **or** `/`. – chris85 Jan 10 '17 at 16:21