Why I cannot get "3" by matching [a-zA-Z0-9]+ within boundaries of sequences?

Question

I have to decode the frames. Frames are in the long string and the beginning of the frame is "CC" and end of the frame is "DD". I'd like to capture everything as it is between the header and footer.

I've found all frames and I did put them into array. The array sample looks like:

CCdatadfhdfghata1DD
CC3DD
CCdatazxczxczxczxdata3DD

Now I'd like to strip out the header and the footer from the these frames. So I've prepared the RegEx:

[^CC][a-zA-Z0-9]+[^DD]

However, it won't make a match for the frame with the content 3. Why? Shouldn't the [a-zA-Z0-9]+ expression cover it? I expect:

datadfhdfghata1
3
datazxczxczxczxdata3

Instead I see:

datadfhdfghata1

datazxczxczxczxdata3

Check out what `[^CC]` means. http://stackoverflow.com/q/22937618/3622940 — Unihedron, Sep 08 '14 at 17:01

score 3 · Accepted Answer · edited May 23 '17 at 11:57

3

Your regex isn't matching what you expect at all. Here:

Negated character class: Any character that aren't "C" or "C" (aka redundant)
 |
 |    A character from the ranges
 |    |
 |    |           > A character that isn't "D" or "D"
[^CC][a-zA-Z0-9]+[^DD]

This would match between a character that isn't "C" (inclusive), one to more a-zA-Z0-9s, and a character that isn't "D" (inclusive). This logic is not correct as your sequences will only be matched if they are at least three characters long. Change it to this:

CC\K[a-zA-Z0-9]+(?=DD)

Expression explanation:

CC Match the sequence "CC" literally.
\K Drop match and keep.
[a-zA-Z0-9]+ Things you want to match.
(?=DD) Asserts that "DD" follows our match.

Here is a regex demo.

As a side note, [a-zA-Z0-9] can be replaced to a shorthand class [^\W_].

edited May 23 '17 at 11:57

Community

1
1

answered Sep 08 '14 at 17:09

Unihedron

10,902
13
62
72

This works without needing groups; nice. Messy, but nice. Ultimately the OP *should* be using groups, but this works too. – Qix - MONICA WAS MISTREATED Sep 08 '14 at 17:14
1

@Qix Exactly - Capturing groups are a perfect fit for this scenario. Unfortunately the implementation is missing, and writing a drop-forth regex would be a code-saver. Messy, but saves code. – Unihedron Sep 08 '14 at 17:16

score 1 · Answer 2 · answered Sep 08 '14 at 16:53

1

A ^ inside your square brackets translates to a NOT operations. So you're actually telling it to look for patters that DO NOT start with a "C".

Try CC([a-zA-Z0-9]+)DD. The parenthesis allow you to extract the matched data from the pattern without the CC and DD blocks.

answered Sep 08 '14 at 16:53

Babak Naffas

12,395
3
34
49

If I do `CC([a-zA-Z0-9]+)DD` I get the data with the footer and header. No value added. – user1146081 Sep 08 '14 at 17:00
1

You would beed to access the grouped data from your match since you want to exclude the CC and DD. – Babak Naffas Sep 08 '14 at 17:02
Babak, yes I know that, but any example of doing this? – user1146081 Sep 08 '14 at 17:08

Why I cannot get "3" by matching [a-zA-Z0-9]+ within boundaries of sequences?

2 Answers2