Regex to extract variable substring

Question

I really have tried to solve this myself but have been bashing my head against a brick wall with this one.

I have a file with many rows like this:-

<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>

I want a regexp to return just the string between the name=" and the next "

In this case, it's 'Net Salary per month € (3rd Applicant)' but it could be anything. That's what I meant by extracting a variable substring.

Thanks in advance.

Doesn't hurt to include what you have so far, it may help both you and others after you to see how it could be fixed. It's also a lot easier to learn regex if you allow us to fix what you already have instead of starting from scratch and doing what we think best. There are often many ways to solve a simple regex-problem. — melwil, Aug 17 '15 at 16:41

score 2 · Answer 1 · answered Aug 17 '15 at 16:23

2

(?<=name=")[^"]*

This should do it for you.See demo.

https://regex101.com/r/uF4oY4/50

If you dont have lookarounds then use

name="([^"]*)

and grab the group 1.

answered Aug 17 '15 at 16:23

vks

67,027
10
91
124

Thank you. No double quotes included, which is what I wanted. – Clive Aug 17 '15 at 16:39

Vineet Kumar Doshi · Answer 2 · 2015-08-17T16:48:01.787

2

This may help: Regex = name="(.*?)"

DEMO

https://regex101.com/r/uF4oY4/51

Let me know if it helps.

edited Aug 17 '15 at 16:48

answered Aug 17 '15 at 16:32

Vineet Kumar Doshi

4,250
1
12
20

Thanks for that. I didn't make it very clear but I didn't want the double quotes including. Good help though. – Clive Aug 17 '15 at 16:39
Edited. Now it returns group inside " ". – Vineet Kumar Doshi Aug 17 '15 at 16:48

score 0 · Answer 3 · answered Aug 17 '15 at 16:51

0

As there are a lot of '"' characters after name you would probably have to use the lazy flag

try

^.*name=\"(.+?)\".*$

matches the whole line and should give you want you want within the group (.+?)

answered Aug 17 '15 at 16:51

Sascha Kolberg

7,092
1
31
37

score 0 · Answer 4 · answered Dec 09 '20 at 13:12

There are helpful regexes in the existing answers; using one with the -replace operator allows you to extract the information of interest in a single operation:

$line = '<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>'

# Extract the "name" attribute value.
# Note how the regex is designed to match the *full line*, which is then
# replaced with what the first (and only) capture group, (...), matched, $1
$line -replace '^.+ name="([^"]*).+', '$1'

This outputs a string with verbatim content Net Salary per month € (3rd Applicant).

Taking a step back: Your sample line is a valid XML element, and it's always preferable to use a dedicated XML parser.

Parsing each line as XML will be slow, but perhaps you can parse the entire file, which offers a simple solution using PowerShell's property-based adaption of the XML DOM, via the [xml] type (System.Xml.XmlDocument):

$fileContent = @'
<xml>
<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
<outputColumn id="427" name="Net Salary per month € (4th Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
</xml>
'@

([xml] $fileContent).xml.outputColumn.name

The above yields the "name" attribute values across all <outputColumn> elements:

Net Salary per month € (3rd Applicant)
Net Salary per month € (4th Applicant)

Regex to extract variable substring

4 Answers4