-2

I need a regular expression that given the following XML, will give me all the products (productos) that have 'Bebidas' as a category (categoria), and I have to do this in Sublime Text, so only have the option to use a regular expression (no dedicated XML parser allowed):

XML File www.ethgf.com/electricos.xml

I have a problem when I use (?s)<producto>(.+?Bebidas.+?)<\/producto> because it highlights almost all the XML (the first 'producto' tag through the last tag closure).

cbeltrangomez
  • 412
  • 2
  • 9

1 Answers1

2

Since the question is about selecting the whole <product> nodes, you can use the following regex:

(?s)<product>(?:\s*<(\w+)>[^<]*?<\/\1>\s*)*?\s*<category>Drinks<\/category>(?:\s*<(\w+)>[^<]*?<\/\2>\s*)*?\s*<\/product>

It will highlight all <product> nodes that contain Drinks category, even if the nodes are not following some strict order:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • The problem is that not all the nodes have the same order in the entries as well as there are other products with more nodes containing other information. – cbeltrangomez Mar 31 '15 at 21:53
  • @cbeltrangomez: I updated the answer. I guess there are no "orphan" text nodes somewhere in between those you provided? If there no such words, this should work. – Wiktor Stribiżew Mar 31 '15 at 22:21
  • 1
    This one is working for me in ST3 using the data provided in the question. @cbeltrangomez Please post **relevant** example data in your question. Someone could write an answer to match your data, then you come back and say "*That's not really my data, actually it includes this and this, as well.*" That's called moving the goalposts, and is severely frowned upon here. Give a [Minimal Complete Working Example](http://stackoverflow.com/help/mcve) in your question, so people can answer it as-posed, without relying on extra comments from you. – MattDMo Mar 31 '15 at 22:23
  • Thanks for the advice, I'm kind of new using stackoverflow, I've changed the question and the original file is in www.ethgf.com/electricos.xml. – cbeltrangomez Mar 31 '15 at 22:41
  • @cbeltrangomez thanks for clarifying things. I'll ask again, as you haven't addressed it yet - is there a particular reason why you don't want to use a parser? As I alluded to in my comment above, you could easily write a Python plugin (or external Python program) to parse your file for a desired attribute, and return whichever data you're looking for, or delete the containing node if that's what you want to do. This could be accomplished in ~5-10 lines with [`lxml`](http://lxml.de) in a fraction of the time it's taken to answer this question. – MattDMo Mar 31 '15 at 22:50
  • Hi, sadly the answer didn't work, maybe because my file is larger when I try to use that regex, sublime stops working. I want to use a regular expression for 2 reasons, first, I don't know how to make the python plugins and second I really want to learn about regex, I thing thats very useful when you are coding. Thanks for all the support – cbeltrangomez Apr 01 '15 at 22:26
  • @cbeltrangomez: I see. You can split such files with 3rd party tools that split by specific number of bytes, and then process by chunks. You might lose a couple of nodes, but will get at least some results.. – Wiktor Stribiżew Apr 02 '15 at 05:51
  • @cbeltrangomez: Since the question is not about parsing large files using Sublime Text, I think you can accept my answer, and ask another one about how to implement the `lxml` or any other XML parser for the tool (do not forget to post the outcome of your efforts). – Wiktor Stribiżew Apr 03 '15 at 08:28