0

I have a large number of records of the following type, which I have to modify

  1. I would like to remove the created_by="29" line without leaving a space. Note: A wild card inside the created_by value would be preferable

  2. I would like to remove the entire line creation_date="..." and the /> greater should move after state="1"/>

  3. Insert a new static line before state variable (e.g. modified_by="30")

XML:

<user id="1"
      org_id="3"
      created_by="29"
      state="1"
      creation_date="2010-06-01"/>

What kind of regular expression should I use to change this?

josh3736
  • 139,160
  • 33
  • 216
  • 263
Sam
  • 8,387
  • 19
  • 62
  • 97

2 Answers2

2

A regular expression is the wrong way to approach this problem for a whole host of reasons, many of which are outlined in the answers to this question.

Instead, you will find that you'll have fewer headaches if you use a proper XML parser and use XPath to identify the parts of your XML document that you want to change.

Community
  • 1
  • 1
josh3736
  • 139,160
  • 33
  • 216
  • 263
  • I am just trying to manipulate sample data here using Eclipse. I don't intend to do this programmatically, if its possible to resolve it using a simple search-and-replace paradigm, I will probably stick with it or else would do it manually. – Sam Jul 27 '10 at 07:18
2

Assuming the attributes always appear in the same order:

search: (\s+)created_by="[^"]+"(\s+state="[^"]+")\s+creation_date="[^"]+"

replace: $1modified_by="30"$2

If you need to specify the element name, you can add this to the beginning of the regex:

(<user(?:\s+\w+="[^"]+")+?)

...and change the capture-group references in the replacement like this:

$1$2modified_by="30"$3

Alan Moore
  • 73,866
  • 12
  • 100
  • 156