1

I have a big xml file (>300 mb), I am using notepad++ to find and replace using regular expression. I need to select (and remove) xml node which has multiple children across multiple lines.

<contact attrib="foo">
    <child 1></child1 1>
    <child 2></child1 2>
    ...
    <child n></child n>
</contact>

I tried searching with

<contact.*?</contact>

this only works if its all on the same line. Having trouble selecting multiple lines. Any suggestions?

Akash
  • 349
  • 3
  • 14
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – bmargulies Jan 03 '12 at 18:07
  • Are you searching for child nodes, or child nodes that span multiple lines? Try this ``, instead of the regex dot `.` metachar. Should work if notepad++ suports `\S`. –  Jan 03 '12 at 18:07
  • `` did not work either – Akash Jan 03 '12 at 19:13

2 Answers2

1

The problem you're having lies with the engine Notepad++ uses for regular expressions. Refer to the answer posted here for solutions. I've had success with this particular one in advanced search mode:

Ctrl+M will insert something that matches newlines. They will be replaced by the replace string. I recommend this method as the most reliable, unless you really need to use regex.

Also, if you need to edit large XML files, I'd recommend an editor like Foxe for a more intuitive workflow.

Community
  • 1
  • 1
bkzland
  • 569
  • 4
  • 11
  • This was helpful, Due to time restrictions I ended up writting a php script to do the job. thanks. – Akash Jan 03 '12 at 19:16
-2

what you're trying to accomplish is, strictly speaking, not regular and thus can't be defined in a general way as a regular expression as the potential recursion depth is unlimited. See what does regular in regex/"regular expression" mean? for details.

That being said, if what you're really trying to do is make your regular expression span multiple lines, most regular expression engines have a 'multi-line' option that changes how line breaks are handled (e.g., . means 'every character' rather than 'every character but newline'. Se your regex engine documentation for details.

Community
  • 1
  • 1
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • He would still be able to limit the search with something like `.+?`, but the dot not matching newline and npp not supporting newline in regex-mode prevents all attempts in this direction. – bkzland Jan 03 '12 at 18:15