0

I have to decode the frames. Frames are in the long string and the beginning of the frame is "CC" and end of the frame is "DD". I'd like to capture everything as it is between the header and footer.

I've found all frames and I did put them into array. The array sample looks like:

CCdatadfhdfghata1DD
CC3DD
CCdatazxczxczxczxdata3DD

Now I'd like to strip out the header and the footer from the these frames. So I've prepared the RegEx:

[^CC][a-zA-Z0-9]+[^DD]

However, it won't make a match for the frame with the content 3. Why? Shouldn't the [a-zA-Z0-9]+ expression cover it? I expect:

datadfhdfghata1
3
datazxczxczxczxdata3

Instead I see:

datadfhdfghata1

datazxczxczxczxdata3
Unihedron
  • 10,902
  • 13
  • 62
  • 72
user1146081
  • 195
  • 15

2 Answers2

3

Your regex isn't matching what you expect at all. Here:

Negated character class: Any character that aren't "C" or "C" (aka redundant)
 |
 |    A character from the ranges
 |    |
 |    |           > A character that isn't "D" or "D"
[^CC][a-zA-Z0-9]+[^DD]

This would match between a character that isn't "C" (inclusive), one to more a-zA-Z0-9s, and a character that isn't "D" (inclusive). This logic is not correct as your sequences will only be matched if they are at least three characters long. Change it to this:

CC\K[a-zA-Z0-9]+(?=DD)

Expression explanation:

  • CC Match the sequence "CC" literally.
  • \K Drop match and keep.
  • [a-zA-Z0-9]+ Things you want to match.
  • (?=DD) Asserts that "DD" follows our match.

Here is a regex demo.

As a side note, [a-zA-Z0-9] can be replaced to a shorthand class [^\W_].

Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • This works without needing groups; nice. Messy, but nice. Ultimately the OP *should* be using groups, but this works too. – Qix - MONICA WAS MISTREATED Sep 08 '14 at 17:14
  • 1
    @Qix Exactly - Capturing groups are a perfect fit for this scenario. Unfortunately the implementation is missing, and writing a drop-forth regex would be a code-saver. Messy, but saves code. – Unihedron Sep 08 '14 at 17:16
1

A ^ inside your square brackets translates to a NOT operations. So you're actually telling it to look for patters that DO NOT start with a "C".

Try CC([a-zA-Z0-9]+)DD. The parenthesis allow you to extract the matched data from the pattern without the CC and DD blocks.

Babak Naffas
  • 12,395
  • 3
  • 34
  • 49