0

I wrote the following regex to parse cups' printers.conf file hoping to extract some data to process with a jar. Given the trace of the cups' printers.conf file:

<Printer AO002LSR01>
UUID urn:uuid:db0082f5-a114-36ad-6a86-ae4225c64b31
Info AO002LSR01
Location AO002LSR01
MakeModel Generic CUPS-PDF Printer
DeviceURI socket://172.100.100.4:9100
State Idle
StateTime 1612866350
Reason toner-empty-warning
Type 8450124
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy retry-job
Attribute marker-colors \#000000
Attribute marker-levels 0
Attribute marker-names Black Cartridge HP CF226A
Attribute marker-types toner
Attribute marker-change-time 1612866349
</Printer>
<Printer AR000test>
UUID urn:uuid:9296f953-2df2-3ce9-5d4b-9108c5aa8b51
Info Zebra Test - Magazzino Arezzo
Location Magazzino Vestizione Arezzo
DeviceURI socket://192.168.9.5:9100
State Idle
StateTime 1471339234
Type 4
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy retry-job
</Printer>

As you may guess the first "Printer" tag is matched and correctly addresses the groups MakeModel, DeviceURI, Accepting, and Shared whether the second "Printer" tag does not, since MakeModel is not present and the other groups get bypassed by the [\s\S]*? directive. What I'm aiming to do is to match all four words even if one is missing.

The regex I wrote this far is the following:

<Printer (.*)>[\s\S]*?((MakeModel) (.*))[\s\S]*?(DeviceURI) (.*)[\s\S]*?(Accepting) (.*)[\s\S]*?(Shared) (.*)[\s\S]*?<\/Printer>

I've already tried with the ? operator and the negative look ahead/behind syntax but to no avail. Also, the group should belong to the same match, since all the data should be somehow related for post processing. Could someone please help me?

Thanks

Manuel Celli
  • 101
  • 1
  • 1
  • 9
  • A bit inefficient but something like this would work: https://regex101.com/r/fIjS56/1 – MonkeyZeus Feb 10 '21 at 18:33
  • Thank you very much for the prompt response @MonkeyZeus! I can understand the underlying logic you applied there. Since I've got to create an object out of the properties I capture with groups between the tags and <\Printer> so it should belong to the same match, matching more than one line at a time, and extrapolating the groups. I'll surely try to tweak your code, thanks! – Manuel Celli Feb 11 '21 at 08:21
  • You're welcome. If you want to make it a little easier then you can `.split()` on `(?<=<\/Printer>)\s+(?= – MonkeyZeus Feb 11 '21 at 12:46
  • Thanks a bunch, @MonkeyZeus, I was trying to use regex to be as cross-language as possible and to avoid testing multiple times for match but it is indeed an easier approach. I'll see what I get there. – Manuel Celli Feb 11 '21 at 14:25
  • Why? Do you plan to port your app to PHP, Python, .NET, or JS or something? Every programming language's regex flavor has it's own nuances which makes the regex itself non-portable. If you're actually trying to achieve portability then you will need to avoid language specific features such as lookbehinds, named capture groups, and so much more... – MonkeyZeus Feb 11 '21 at 14:29
  • Even choosing to use `\d` instead of `[0-9]` can have consequences. https://stackoverflow.com/a/890734/2191572 – MonkeyZeus Feb 11 '21 at 14:33
  • Thank you so so much! I always used regex for simpler patterns, so I was somewhat at a loss here. The main reason for cross-language portability was due to the fact I could've indeed port from python to java since I'm testing there at the moment. Also, I think it would've been easier to create an object with the properties since at match 1 would correspond x named groups each reflecting a property, so basically the object would already be defined and just needed to be translated. – Manuel Celli Feb 11 '21 at 14:44
  • Unless you're specifically commissioned (paid) to make something "portable" then the easiest thing to do is to program efficiently with the specific language you're using. Why introduce regex and take a performance hit for the sake of portability? Write good pseudo-code so that your future self will thank you. Unless you're intricately aware of the regex nuances between Java and Python then you will be helpless when the regex fails to work. Stack Overflow is a very convenient website but to think you can rely on it in 5 years when your code mysteriously breaks is not the best plan of action. – MonkeyZeus Feb 11 '21 at 14:53
  • Thank you very much! I'll try to use another approach, logically it would've been simpler to manipulate data that was already grouped by matches, but I get your point, thanks for the insight. – Manuel Celli Feb 11 '21 at 15:31
  • At the end of the day, it's your code. If you want to spend 3x as long writing it in one language for future hope of portability to some other language instead of spending 1x in one language and 1x in another language then that's your prerogative. Cross-language portability isn't free. Heck wait until you realize your code breaks simply for upgrading the version number of your primary language or even for switching environments between Windows and Linux! – MonkeyZeus Feb 11 '21 at 16:00
  • Sure, I get what you're saying, I'm better off looping than overcomplicate patterns when I can just match N times each row and do it in a smart way since I've got start and end labels. Thanks a lot :-) – Manuel Celli Feb 11 '21 at 16:04

0 Answers0