2

I really have tried to solve this myself but have been bashing my head against a brick wall with this one.

I have a file with many rows like this:-

<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>

I want a regexp to return just the string between the name=" and the next "

In this case, it's 'Net Salary per month € (3rd Applicant)' but it could be anything. That's what I meant by extracting a variable substring.

Thanks in advance.

Clive
  • 143
  • 1
  • 3
  • 8
  • Try this regex : ^ – Jacques Martin Aug 17 '15 at 16:23
  • Doesn't hurt to include what you have so far, it may help both you and others after you to see how it could be fixed. It's also a lot easier to learn regex if you allow us to fix what you already have instead of starting from scratch and doing what we think best. There are often many ways to solve a simple regex-problem. – melwil Aug 17 '15 at 16:41

4 Answers4

2
(?<=name=")[^"]*

This should do it for you.See demo.

https://regex101.com/r/uF4oY4/50

If you dont have lookarounds then use

name="([^"]*)

and grab the group 1.

vks
  • 67,027
  • 10
  • 91
  • 124
2

This may help: Regex = name="(.*?)"

DEMO

https://regex101.com/r/uF4oY4/51

Let me know if it helps.

Vineet Kumar Doshi
  • 4,250
  • 1
  • 12
  • 20
0

As there are a lot of '"' characters after name you would probably have to use the lazy flag

try

^.*name=\"(.+?)\".*$

matches the whole line and should give you want you want within the group (.+?)

Sascha Kolberg
  • 7,092
  • 1
  • 31
  • 37
0

There are helpful regexes in the existing answers; using one with the -replace operator allows you to extract the information of interest in a single operation:

$line = '<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>'

# Extract the "name" attribute value.
# Note how the regex is designed to match the *full line*, which is then
# replaced with what the first (and only) capture group, (...), matched, $1
$line -replace '^.+ name="([^"]*).+', '$1'

This outputs a string with verbatim content Net Salary per month € (3rd Applicant).


Taking a step back: Your sample line is a valid XML element, and it's always preferable to use a dedicated XML parser.

Parsing each line as XML will be slow, but perhaps you can parse the entire file, which offers a simple solution using PowerShell's property-based adaption of the XML DOM, via the [xml] type (System.Xml.XmlDocument):

$fileContent = @'
<xml>
<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
<outputColumn id="427" name="Net Salary per month € (4th Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
</xml>
'@

([xml] $fileContent).xml.outputColumn.name

The above yields the "name" attribute values across all <outputColumn> elements:

Net Salary per month € (3rd Applicant)
Net Salary per month € (4th Applicant)
mklement0
  • 382,024
  • 64
  • 607
  • 775