2

I am not good with regex, but I have the following, but I assume part of the following means look for 13 - 16 digits and then return a success if it finds 3 - 4 digits after that. The problem is that the 3 - 4 digits are optional and they can also be before the 13 - 16 digit number, so I guess I want to combine a positive lookahead/lookbehind, negative lookahead/lookbehind. This sounds way to complex, is there a simpler way?

(\d{13,16})[<"'].*?(?=[>"']\d{3,4}[<"'])[>"'](\d{3,4})[<"']

which will match the ccnum and the series in the following snippet:

<CreditCard> 
     name="John Doe""
     ccnum=""1111123412341231"" 
     series="339"
     exp="03/13">
</CreditCard>

However, if I remove the ccnum or series, it doesn't match anything, and the series can be optional. Also the series can appear before or after the ccnum, so if I put the series attribute before the ccnum attribute, it doesn't match anything either. It also doesn't match if I have a series before a ccnum as separate elements, such as or if I disregard a series element:

<CreditCard> 
<series>234</series>
<ccnum>1235583839293838</ccnum>
</CreditCard>

I need the regex match the following scenarios, but I do not know the exact name of the elements, in this case, I just called them ccnum and series.

Here are the ones that work:

<CreditCard> 
            <ccnum>1235583839293838</ccnum>
            <series>123</series>
</CreditCard>

<CreditCard ccnum="1838383838383833"> 
            <series>123</series>
</CreditCard>

<CreditCard ccnum="1838383838383833" series="139"
</CreditCard>

It should also match the following, but does not:

<CreditCard ccnum="1838383838383833"
            </CreditCard>

<CreditCard series="139" ccnum="1838383838383833" 
            </CreditCard>

<CreditCard ccnum="1838383838383833"></CreditCard>

<CreditCard> 
    <series>123</series>                
    <ccnum>1235583839293838</ccnum>
</CreditCard>

<CreditCard>          
<ccnum series="123">1235583839293838</ccnum>
</CreditCard>

Right now, to get this to work, I am usinng 3 separate regular expressions:

1 to match a credit card number that comes before a security code.

1 to match a security code that comes before a credit card number.

1 to match just a credit card number.

I tried combining the expressions into an or, but I end up with 5 total groups (2 from the first 2 expressions and 1 from the last one)

tchrist
  • 78,834
  • 30
  • 123
  • 180
Xaisoft
  • 45,655
  • 87
  • 279
  • 432
  • What is wrong with `\d{13,16}` ? – leppie Jan 27 '12 at 21:05
  • @leppie - That just matches the ccnum, not the optional series number – Xaisoft Jan 27 '12 at 21:07
  • 10
    Then you should not be using regex for this. Just use a XML parser and validate the values with regex. You can even use XML Scheme to validate values. – leppie Jan 27 '12 at 21:08
  • the elements and attributes vary, so I can't an xml parser. – Xaisoft Jan 27 '12 at 23:32
  • `the elements and attributes vary, so I can't an xml parser` is a non-sequitur. You just wouldn't use Schema validation? Use an XmlReader or just a general XPath query to locate the text nodes and work on them. Or consider writing a full parser for your grammar (since that is what this is) – sehe Jan 31 '12 at 22:07

3 Answers3

0
(?<=[>\"'](\\d{3,4})[<\"'].{0,100})?[>\"'](\\d{13,16})[<\"'](?=.*[>\"'](\\d{3,4})[<\"'])?

This will create three capture groups, where the ccnum is always in the second group, and the series can be in the first, the third, or none of the groups.

ccnum = match.Groups[2].Value;
series = match.Groups[1].Value + m.Groups[3].Value;
Sergiu Dumitriu
  • 11,455
  • 3
  • 39
  • 62
0

It is probably much easier to pull the XML into an XDocument using its Parse method. Then you can use XPath or other means of finding that data.

As for the regex: You regex is to complex for me to comprehend, but this is how you make a certain block optional: "(thisisoptional)?".

And you cannot account for the two different orders except by including both orders manually into the regex. So if you want to be able to match "ab" and "ba" (different order), you need the following regex: "((ab)|(ba))". So everything is twice in there. You can reduce the nastyness of this by factoring out "a" and "b" into a string variable each.

usr
  • 168,620
  • 35
  • 240
  • 369
0

You could try recursively traversing the XML document and scraping every attribute and text node that matches your expression for ccnum and series and appending them to List<string> ccNumList and List<string> seriesList. If ccnum and series are in the same order in the DOM tree hierarchy then ccNumList[i] == seriesList[i].

An example of doing a recursive tree traversal is here.

Community
  • 1
  • 1
  • ccnum and series are just examples, it varies from xml to xml, so I can't parse it with the xml parser. – Xaisoft Jan 27 '12 at 23:33
  • @Xaisoft - It doesn't matter what the element and attribute names are if you just check each text node and attribute against your regular expressions for a ccnumber and series. The key is since the numbers will appear in the same order in the file (which is true unless the file has some sort of secondary id for them) the two lists should be in the same order. –  Jan 28 '12 at 00:22