0

I'm having files like given below I want to validate my file if first line is having "ZZ=" and third record is having only "XXX"

If this pattern doesnt match am rejecting file.

File example

ZZ=101
OO
XXX
111111111111
222222222222

00000000000
AAAAAAAAAAAA


valid file example


ZZ=101
OO
XXX
111111111111
222222222222

00000000000
AAAAAAAAAAAA


Not valid file


ZZ=101
OO
XSS  (should be rejected except XXX)
111111111111
222222222222

00000000000
AAAAAAAAAAAA


Not valid file


HH=101 (should be rejected except ZZ=)
OO
XXX  (should be rejected except XXX)
111111111111
222222222222

00000000000
AAAAAAAAAAAA

Is there any way I could use regex to match this pattern?

Any suggestions please..

Thanks !!

kelly
  • 243
  • 3
  • 12
  • 2
    What tool/language are you using? Where is `DDE` in your example text? – Wiktor Stribiżew Dec 02 '16 at 13:16
  • @WiktorStribiżew I have updated the question...its XXX not DDE. I'm using ETL tool using small java – kelly Dec 02 '16 at 13:19
  • Not sure, try `(?m)^ZZ=.*\r?\n.*\r?\nXXX(?:\r?\n.+)*(?:\r?\n){2}.*\r?\n.*` ([demo](https://regex101.com/r/fpo7ZT/1)). – Wiktor Stribiżew Dec 02 '16 at 13:24
  • @WiktorStribiżew....Here is how my file pattern looks like. It could be n number of records. http://prnt.sc/dehz9x Is there any way after matching first and third record..whatever the data is there after third record gets matched as well ..no matter how many records are there, including spaces tabs etc? – kelly Dec 02 '16 at 15:06
  • You wan to match all *valid* records from the start of the file, then skip the first invalid entry, and then match all subsequent records? That is *impossible* to match with 1 regex. Maybe all you need is find the first invalid entry and remove it using a regex replace operation? Try that with [`^ZZ=.*\r?\n.*\r?\n(?!XXX\b).*(?:\r?\n.+)*(?:\r?\n){2}.*\r?\n.*\s*`](https://regex101.com/r/qPY7jX/1) regex. – Wiktor Stribiżew Dec 05 '16 at 09:39
  • Any feedback??? – Wiktor Stribiżew Dec 06 '16 at 14:19

2 Answers2

0

Assuming that by "records" you mean lines separated by "\r" and/or "\n", and each dataset is in a separate file, it would already be sufficient to check for this pattern:

^ZZ=.*[\r\n]{1,2}.*[\r\n]{1,2}XXX

Instead, if you have all datasets merged together in one file, the pattern proposed above by Wiktor Stribiżew seems to cover more precisely the pattern you seek.

EDIT:

This should match anything after the above pattern:

^ZZ=.*[\r\n]{1,2}.*[\r\n]{1,2}XXX[\s\S]*

(I also corrected [\r\n] to [\r\n]{1,2})

Community
  • 1
  • 1
friedemann_bach
  • 1,418
  • 14
  • 29
  • @friedemann_bach...any way to match after matching first and third record....match every thing coming after third record..new lines, empty lines, spaces tab etc etc...no matter how many records are there after that third record https://regex101.com/r/LuMI9e/1 – kelly Dec 02 '16 at 15:16
  • Answer updated. – friedemann_bach Dec 02 '16 at 15:55
0

Try this

/ZZ=\d+.+XXX.+/s

/ - Delimiter
ZZ=\d+.+XXX.+ - Pattern
/ - End Delimiter
s - Single line modifier

I have tried in php and it works

https://regex101.com/r/KXlE1q/1

Vijay Wilson
  • 516
  • 7
  • 21