What is the regex for any set of characters other than "*/"

Question

I have been working with Lex, and encountered a problem of finding comments in a C program and removing them in the final output. To accomplish this I need to identify any occurrence of */ (this is how a traditional multiline comment ends!).

Then my problem is reduced to a subproblem of accepting any sequence of characters other than */ . I tried out a several ways for accomplishing this. I tried out: [^*\/], and reasonably it did not work. Any help or suggestion is appreciable.

`[^*\/]` is a non `*` or `/`. Does lex use PCRE, if so I think `\*\/(*SKIP)(*FAIL)|.` would work.. (also if `/` is not a delimiter it doesn't need to be escaped) — chris85, Jan 10 '17 at 16:00
@Bishwajit Purkaystha Checkout the Regex101.com links. That site as an explanation of the Regex on the right that will do a far better job explaining the regex then I can. — MattSizzle, Jan 10 '17 at 16:18

Dmitry Egorov · Answer 1 · 2017-01-10T16:15:51.747

1

You need to match any character sequence followed by */ two-character sequence literally. A positive look-ahead will you to achieve this. For possible multiple */ occurrences you need a positive look-behind or start of the text ((?<=^|\*\/)):

(?<=^|\*\/).*?(?=\*\/)

Demo: https://regex101.com/r/y88e0T/2

edited Jan 10 '17 at 16:15

answered Jan 10 '17 at 16:01

Dmitry Egorov

9,542
3
22
40

But, this doesn't work on this case. For the case: `is this */ how you do */` I want to find the two segments: [is this] and [how you do]. But your solution gives [is this */ how you do]. Moreover the comments may span multiline. – Bishwajit Purkaystha Jan 10 '17 at 16:11
Yes, I saw it Could you direct me where I can learn Regular Expressions effectively? Thanks for answering. – Bishwajit Purkaystha Jan 10 '17 at 16:19
@BishwajitPurkaystha, the http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean is good as an intro. http://www.regular-expressions.info/ is a great resource but not quite easy to absorb due to huge amount of the information. Yet I utterly advise you to consider it. The [SO info page](http://stackoverflow.com/tags/regex/info) has quite a number of useful links on the topic. And as MattSizzle already said, check out the Regex101.com links. – Dmitry Egorov Jan 10 '17 at 16:26
Thanks @Dmitry Egorov for taking time to write for me! – Bishwajit Purkaystha Jan 10 '17 at 16:29

MattSizzle · Accepted Answer · 2017-01-10T16:15:42.487

1

I would use a combination of lookahead and lookbehind to achieve the top-level solution.

Regex101

/(?<=\/\*).*(?=\*\/)/s

This will insure everything (assuming newlines and the s modifier) between /* and */ is captured in a group.

For "What is the regex for any set of characters other than “*/”"

Regex101

That is as simple as:

/[^\/\*]/

edited Jan 10 '17 at 16:15

answered Jan 10 '17 at 16:06

MattSizzle

3,145
1
22
42

You seem to have swapped the lookahead and the lookbehind. BTW, they're zero-length matches so not need in the capture group - the entire match will give you what is required: https://regex101.com/r/8Zpmhu/2 – Dmitry Egorov Jan 10 '17 at 16:11
@Dmitry Egorov You are correct. This is my morning warmup "coffee" question. Thanks. I updated my answer. – MattSizzle Jan 10 '17 at 16:15
Hey, @MattSizzle your solution worked. But can you please elaborate on how you did actually build it? I'm really a novice. Thanks. – Bishwajit Purkaystha Jan 10 '17 at 16:15
1

`[^\/\*]` is not `any set of characters other than “*/”"` it is any character other than `*` **or** `/`. – chris85 Jan 10 '17 at 16:21

What is the regex for any set of characters other than "*/"

2 Answers2