-3
$line =~ s/^<(\w+)=\"(.*?)\">//;

What is the meaning of this line in perl?

anubhava
  • 761,203
  • 64
  • 569
  • 643

2 Answers2

2

The s/.../.../ is the substitution operator. It matches its first operand, which is a regular expression and replaces it with its second operand.

By default, the substitution operator works on a string stored in $_. But your code uses the binding operator (=~) to make it work on $line instead.

The two operands to the substitution operator are the bits delimited by the / characters (there are more advanced versions of these delimiters, but we'll ignore them for now). So the first operand is ^<(\w+)=\"(.*?)\"> and the second operand is an empty string (because there is nothing between the second and third / characters).

So your code says:

  • Examine the variable $line
  • Look for a section of the string which matches ^<(\w+)=\"(.*?)\">
  • Replace that part of the string with an empty string

All that is left now is for us to untangle the regular expression and see what that matchs.

  • ^ - matches the start of the string
  • < - matches a literal < character
  • (...) - means capture this bit of the match and store it in $1
  • \w+ - matches one or more "word characters" (where a word character is a letter, a digit or an underscore)
  • = - matches a literal = character
  • \" - matches a literal " character (the \ is unnecessary here)
  • (...) - means capture this bit of the match and store it in $2
  • .*? - matches zero or more instances of any character
  • \" - matches a literal " character (once again, the \ is unnecessary here)
  • > - matches a literal >

So, all in all, this looks like a slightly broken attempt to match XML or HTML. It matches tags of the form <foo="bar"> (which isn't valid XML or HTML) and replaces them with an empty string.

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
0

It's searching for an XML tag at the start of a string, and substituting it with nothing (i.e. removing it).

For example, in the input:

<hello="world">example

The regex will match <hello="world">, and substitute it with nothing - so the final result is just:

example

In general, this is something that you shouldn't do with regex. There are a dozen different ways you could create false negatives here, that don't get stripped from the string.

But if this is a "quick and dirty" script, where you don't need to worry about all possible edge cases, then it may be OK to use.

Tom Lord
  • 27,404
  • 4
  • 50
  • 77
  • That's not valid XML, so doubt many of the XML parsers would accept it. – Chris Turner Sep 13 '17 at 09:07
  • Agreed -- but I presume that this line of code is part of a larger "XML sanitisation" script. I bet there's a bunch more substitution commands, to strip "other formats" of tags. – Tom Lord Sep 13 '17 at 10:02