4

I'm busy with GS1-128 and want to match scanned barcodes using RegEx. I currently have the following expression:

^(01)(12345678)(\d{5})\d(11|17)(\d{2}[0-1]\d[0-3]\d)(10|21)(\d{1,20})(30)(\d{1,20})

This succesfully matches the barcode (01)12345678123450(11)130500(21)1234567890(30)42, splitting it up into the following groups:

  1. 01 - GTIN
  2. 12345678 - company code (dummy) - 8 digits
  3. 12345 - partcode (dummy) - 5 digits
  4. 11 or 17 - Production date/expiry date
  5. 130500 - date - 6 digits
  6. 10 or 21 - batch/serial number
  7. 1234567890 - 1 to 20 characters
  8. 30 - count of items (optional)
  9. 42 - 1 to 8 characters (optional)

Now, I sometimes have a barcode that doesn't have the count of items AI; 30. I can't seem to figure out how to work this into my regex at all. Whenever I make group 8 & 9 optional, the content of these groups get thrown into group 7 for all barcodes that do contain AI 30.

How do I go about making AI 30 optional while preventing it from being grouped with AI 21/10?

Test cases:

(01)12345678654320(11)120500(21)1234567890 should give the following matches:

  1. 01
  2. 12345678
  3. 65432
  4. 11
  5. 120500
  6. 21
  7. 1234567890
  8. NO MATCH
  9. NO MATCH

(01)12345678124570(17)130700(10)30567(30)50 should give the following matches:

  1. 01
  2. 12345678
  3. 12457
  4. 17
  5. 130700
  6. 10
  7. 30567
  8. 30
  9. 50

(01)12345678888880(11)140200(21)66503042(30)100 should give the following matches:

  1. 01
  2. 12345678
  3. 88888
  4. 11
  5. 140200
  6. 21
  7. 66503042
  8. 30
  9. 100

Note that the parentheses are only to show where the AI begins, the barcode itself omits these.

Terry Burton
  • 2,801
  • 1
  • 29
  • 41
  • 3
    Regarding the detection of sections 8 and 9: if you had to do this programatically using string manipulation (without REGEX) how would you do it? It seems pretty ambiguous to me. – Cristian Lupascu Jul 01 '13 at 10:14
  • Try making group 7 non-greedy, like `\d{1,20}?`. That gives precedence to group 8. – Martin Ender Jul 01 '13 at 10:16
  • Just a thought: is it possible that group 7 could contain `30` among its 1 to 20 characters? – Cristian Lupascu Jul 01 '13 at 10:17
  • @w0lf I previously used string manipulation for group 1 to 5, which was just substrings because they never changed length. I got asked to add batch/serial and count recently which is why I'm now doing this with REGEX. Group 7 contains 30 amongst its characters the moment I make 30 optional. –  Jul 01 '13 at 10:19
  • @m.buettner Tried that, it only causes group 7 to match a single character (turbo-lazy!) and AI 30 gets ignored altogether, whether the barcode contains it or not –  Jul 01 '13 at 10:20
  • 2
    @Quatroking I wasn't asking from the REGEX matching point of view. I meant: what if group 7 would *really* contain `30` and after that we could have another `30` (optional)? That could make the specification ambiguous. – Cristian Lupascu Jul 01 '13 at 10:22
  • 1
    @Quatroking also, it would be great if you could provide some test cases and the expected results for each of them. – Cristian Lupascu Jul 01 '13 at 10:23
  • Added three examples, I hope this is clear enough. –  Jul 01 '13 at 10:31
  • @Quatroking OK, but let's consider the following test case: `(01)12345678124570(17)130700(10)993099` (where `993099` is group 7 and groups 8 and 9 are missing). Is such a case possible? If it is, then the rules are ambiguous. – Cristian Lupascu Jul 01 '13 at 10:54
  • @w0lf Yes, such a case is possible. Not all barcodes make use of AI 30. –  Jul 01 '13 at 11:07
  • @Quatroking yes, but due to the fact that there's a `30` inside group 7, that could also be interpreted as: group 7=`99`, group 8=`30`, group 9=`99` without abusing the rules. – Cristian Lupascu Jul 01 '13 at 11:09

3 Answers3

2

Try this:

^(?<gtin>\(01\))(?<comp_code>12345678)(?<part_code>\d{5})0?(?<pd_ed>\((?:11|17)\))(?<date>\d{6})(?<bat_no>\((?:21|10)\))(?<data_req>\d{1,20}?)\b(?<count>(?:\(30\))?)(?<data_opt>(?:\d{1,8})?)$

The above expression should match all the following items:

(01)12345678654320(11)120500(21)1234567890
(01)12345678124570(17)130700(10)30567(30)50
(01)12345678888880(11)140200(21)66503042(30)100

Explanation:

<!--
^(?<gtin>\(01\))(?<comp_code>12345678)(?<part_code>\d{5})0?(?<pd_ed>\((?:11|17)\))(?<date>\d{6})(?<bat_no>\((?:21|10)\))(?<data_req>\d{1,20}?)\b(?<count>(?:\(30\))?)(?<data_opt>(?:\d{1,8})?)$

Assert position at the beginning of the string «^»
Match the regular expression below and capture its match into backreference with name “gtin” «(?<gtin>\(01\))»
   Match the character “(” literally «\(»
   Match the characters “01” literally «01»
   Match the character “)” literally «\)»
Match the regular expression below and capture its match into backreference with name “comp_code” «(?<comp_code>12345678)»
   Match the characters “12345678” literally «12345678»
Match the regular expression below and capture its match into backreference with name “part_code” «(?<part_code>\d{5})»
   Match a single digit 0..9 «\d{5}»
      Exactly 5 times «{5}»
Match the character “0” literally «0?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference with name “pd_ed” «(?<pd_ed>\((?:11|17)\))»
   Match the character “(” literally «\(»
   Match the regular expression below «(?:11|17)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «11»
         Match the characters “11” literally «11»
      Or match regular expression number 2 below (the entire group fails if this one fails to match) «17»
         Match the characters “17” literally «17»
   Match the character “)” literally «\)»
Match the regular expression below and capture its match into backreference with name “date” «(?<date>\d{6})»
   Match a single digit 0..9 «\d{6}»
      Exactly 6 times «{6}»
Match the regular expression below and capture its match into backreference with name “bat_no” «(?<bat_no>\((?:21|10)\))»
   Match the character “(” literally «\(»
   Match the regular expression below «(?:21|10)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «21»
         Match the characters “21” literally «21»
      Or match regular expression number 2 below (the entire group fails if this one fails to match) «10»
         Match the characters “10” literally «10»
   Match the character “)” literally «\)»
Match the regular expression below and capture its match into backreference with name “data_req” «(?<data_req>\d{1,20}?)»
   Match a single digit 0..9 «\d{1,20}?»
      Between one and 20 times, as few times as possible, expanding as needed (lazy) «{1,20}?»
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference with name “count” «(?<count>(?:\(30\))?)»
   Match the regular expression below «(?:\(30\))?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      Match the character “(” literally «\(»
      Match the characters “30” literally «30»
      Match the character “)” literally «\)»
Match the regular expression below and capture its match into backreference with name “data_opt” «(?<data_opt>(?:\d{1,8})?)»
   Match the regular expression below «(?:\d{1,8})?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      Match a single digit 0..9 «\d{1,8}»
         Between one and 8 times, as many times as possible, giving back as needed (greedy) «{1,8}»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
-->

EDIT

Omitted escaped parens:

^(?<gtin>01)(?<comp_code>12345678)(?<part_code>\d{5})0?(?<pd_ed>(?:11|17))(?<date>\d{6})(?<bat_no>(?:21|10))(?<data_req>\d{1,20}?)\b(?<count>(?:30)?)(?<data_opt>(?:\d{1,8})?)$
Cylian
  • 10,970
  • 4
  • 42
  • 55
  • 1
    I have the feeling that the parentheses included in the test cases were there only to emphasize the distinction between the groups. Judging by the OP's REGEX, I think they are not included in the real data. – Cristian Lupascu Jul 01 '13 at 11:25
  • w0lf is correct, the parentheses are only to represent the AI's. However, I removed the parentheses from your REGEX and added my own company code, and it's working great! Thanks! EDIT: Actually, after reviewing, removing the parentheses causes AI 30 to fall back into AI 21/10 again :/ –  Jul 01 '13 at 11:29
1

Variable length AIs should be terminated with an FNC1 character. If this is present, you should use this to find the end of the \d{1,20}.

If it is not present, I'd find out where it got stripped and prevent it getting stripped.

weston
  • 54,145
  • 21
  • 145
  • 203
0

try this one, this gives you match groups of all segments ex. (3302)103300(10)L20060831A117 , (02)04008577004106(15)081231(37)000025

\(\d{1,3}[0-9,a-z,A-Z]{1}?\)[\d,a-z,A-Z]{1,70}

after that can you apply this regex

\((.*?)\)

on each segment to find which AI has the code part and after that you can verify, if code part meets the conditions of his AI.

Burak Senel
  • 71
  • 2
  • 7