-3

I have an XML with different Item's which may contain the attribute Setting named SerialNumber. Im trying to get all the item names followed with the serial number.

My approch is using Notepad++ Regex, to get the name of the Item and the value of the attribute Setting named SerialNumbersomething like this:

Sender0;3990 Sender3;4444 Sender4;7774

But trying it the only thing i can get is that notepad++ selects all the text... My fast approach was something like this:

^<Item Name="(.*)" Category=".*<Setting Name="SerialNumber">(.*)</Setting>.*</Item>

And replace:

(\1);(\2)

The XML:

    <Item Name="Sender0" Category="" ClassName="Cars" Schedule="" Enabled="true">
     <Setting>...</Setting>
     <Setting Name="SerialNumber">3990</Setting>
     <Setting>...</Setting>
    </Item>
    <Item Name="Sender1" Category="" ClassName="Cars" Schedule="" Enabled="true">
     <Setting>...</Setting>
     <Setting>...</Setting>
     <Setting>...</Setting>
    </Item>
    <Item Name="Sender2" Category="" ClassName="Cars" Schedule="" Enabled="true">
     <Setting>...</Setting>
     <Setting>...</Setting>
     <Setting>...</Setting>
    </Item>
    <Item Name="Sender3" Category="" ClassName="Cars" Schedule="" Enabled="true">
     <Setting>...</Setting>
     <Setting Name="SerialNumber">4444</Setting>
     <Setting>...</Setting>
    </Item>
    <Item Name="Sender4" Category="" ClassName="Cars" Schedule="" Enabled="true">
     <Setting>...</Setting>
     <Setting Name="SerialNumber">7774</Setting>
     <Setting>...</Setting>
    </Item>

Hope you can help me, thanks :)

GBS
  • 124
  • 1
  • 3
  • 16
  • 3
    Regex + XML = **evil evil evil** ... not my downvote, but regex in NPP is not the best too to use here. Look into using an XML parser. – Tim Biegeleisen Mar 20 '19 at 15:05
  • 1
    Tim is right. What other technologies can you use? For example, would Powershell be an option? – Tomalak Mar 20 '19 at 15:16
  • The idea is to make it work with notepad++ using Regex, i know i can make it with Java or other ways... But the point is understand why my regex isn't working as i expect, sorry for the incovenience. And thanks for answer you both. – GBS Mar 20 '19 at 15:26
  • 2
    Regex isn't working because regex cannot be used with XML. Use one of the tools that have been made for XML processing, they exist for a reason. – Tomalak Mar 20 '19 at 15:31
  • I see its a really bad example to use with notepad++ regex – GBS Mar 20 '19 at 15:32
  • 2
    Indeed, yes. If processing this file is your task, I suggest Powershell because that approach has zero external dependencies in Windows. If you just chose it as a way to learn regex, I suggest working with something other than XML. – Tomalak Mar 20 '19 at 15:41
  • 1
    [Obligatory link about the futility of trying to parse X/HTML with regexes](https://stackoverflow.com/a/1732454/62576) – Ken White Mar 20 '19 at 17:27
  • Ken if you post it as an answer i will put it as a accepted one. This is something that has to be present if another one falls into this question. – GBS Mar 20 '19 at 19:58

1 Answers1

1

I think regex is viable for this. Unless you are missing some details on the question.

Try with this:

Search: \s*<Item\s*Name="([^"]+)"[^>]+>(?:\s*<Setting>.*?<\/Setting>)*(?:\s*<Setting Name="SerialNumber">(.*?)<\/Setting>)?(?:\s*<Setting>.*?<\/Setting>)*\s*<\/Item>

Replace by: (?2\1;\2 )

In notepadd++, the output of your given input would be: Sender0;3990 Sender3;4444 Sender4;7774

NOTE: Do not use . matches new line option. Also, use match case if you need so.

Explanation:

\s*                  # 0 or more spaces (space, tab, new line...)
<Item                # Literal '<item'
    \s*              
    Name="           # Literal 'Name="'
        ([^"]+)      # Any non (") character repeated one or more times
                     #   stored on the first capturing group
    "                # Literal "
    [^>]+            # Any non (>) character repeated one or more
>                    # Literal >
# After searching for Item Name, there must exists its serial number.
# The serialNumber may be sorrounded by other settings, so We will search:
# perhapsSomeSettings + serialNumber + perhapsSomeSettings
# so that we will be able to find (if exists) the serial number wether it
# is placed as the first, last or middle tag.
(?:     # group
    \s*
    <Setting>.*?<\/Setting>
)*      # repeat 0 or more
(?:     # This 'setting' group will have the serial number
    \s*
    <Setting Name="SerialNumber">
    (.*?)    # We capture the data (second capturing group)
    <\/Setting>
)?     # Optional
(?:    
    \s*
    <Setting>
        .*?
    <\/Setting>
)*
\s*
<\/Item>

Please, see also this about greedy/lazy quantifiers.

For the replacement we use (?2\1;\2 )

(?2) is special syntax in notepadd++ (boost) regexes. It means that if the second capturing group exists, then what's inside is applied. So in our case (?2\1;\2 ) if the second capturing group exists our replacement will be first capturing group (name), ; and second capturing group (serialNumber)

Julio
  • 5,208
  • 1
  • 13
  • 42
  • If you can explain a little bit your regex i'll be extremely grateful, for learning purposes and prevent people like me to fall in this kind of questions :) – GBS Mar 25 '19 at 08:59
  • Sure, @Ralsho. I edited my answert to add the explanation. – Julio Mar 25 '19 at 10:29