0

I have the following output string to match using regex:

person1 | Age 20 | M |Gender Male
person2 | Age 11 |   |Gender Female
person3 | Age 23 | M |Gender Female
person4 | Age 32 |   |Gender Male
person5 | Age 41 | M |Gender Male
person11| Age 28 | M |Gender Female
person12| Age 31 | M |Gender Male
person10| Age 33 |   |Gender Male
person8 | Age 26 |   |Gender Male

In the java code, I am using the following exgex to match the above output:

"person[^\n]*1Age 20[^\n]*M[^\n]*Gender Male[^\n]*" +
"person3[^\n]*Age 23[^\n]*M[^\n]*Gender Female[^\n]*" +
"person5[^\n]*Age 41[^\n]*M[^\n]*Gender Male[^\n]*"   +
"person11[^\n]*Age 28[^\n]*M[^\n]*Gender Female[^\n]*"  +
"person12[^\n]*Age 31[^\n]*M[^\n]*Gender Male[^\n]*"

but the problem is the output string is not always in the same order every time, some times the look like:

person1 | Age 20 | M |Gender Male
person2 | Age 11 |   |Gender Female
person3 | Age 23 | M |Gender Female
person4 | Age 32 |   |Gender Male
person11| Age 28 | M |Gender Female
person12| Age 31 | M |Gender Male
person5 | Age 41 | M |Gender Male
person10| Age 33 |   |Gender Male
person8 | Age 26 |   |Gender Male

if I use the same regex to match the output string, it does not match.

are there any ways this problem can be fixed?

ratzip
  • 113
  • 2
  • 11
  • 3
    What is it you are exactly trying to match here? Or rather, what is your end goal? – tenub Jan 30 '14 at 17:00
  • the output string is always changing. I want to match all the entries with the "M" in the third columns, but in the second output string, the order of all the entries changed, and if I still use the same regex to match, it will not work, then how to fix it? – ratzip Jan 30 '14 at 17:09

1 Answers1

0

Highly recommend you split each string on | and turn this into an object that you can parse.

But if you simply want to parse this with regex, you could use this:

^ *person *(?<person>.*?) *\| *age *(?<age>.*?) *\| *(?<someMarker>.*?) *\|gender *(?<gender>.*)$

http://regex101.com/r/pA4eP7

Note that in Java you need to escape backslashes twice, so \| in the example would become \\|.

Also make sure you're using the regex case insensitive modifier.

Community
  • 1
  • 1
brandonscript
  • 68,675
  • 32
  • 163
  • 220
  • yes, but I truly need to verify that person1 is person1, pereson1 age is 20 – ratzip Jan 30 '14 at 17:11
  • Then you're positively not going to be able to build that with a regular expression; if you KNOW what the data should look like, why is it coming out in the wrong order? What are you validating it against? You need to break this down and build a full parser based on your criteria. – brandonscript Jan 30 '14 at 17:13
  • well, the data in the output is the same, but just the some time, the order of each entries is different, so in this case, how to match it? – ratzip Jan 30 '14 at 17:26
  • What I just said... you're not going to be able to match it if the output of one column changes while the others do not. You need to build a function that logically checks each section that might be different and validates it against what you expect it to be. Or better yet, figure out *why* it's coming out in the wrong order, and just fix that... – brandonscript Jan 30 '14 at 17:31