0

I am downloading a webpage and converting into a string using LWP::Simple. When I copy the results into an editor I find multiple instances of the pattern I'm looking for "data-src-hq".

While I'm trying to do something more complex using regex I am starting in baby steps so I can properly learn how to use regex, I started off with just to match "data-src-hq" with the following code:

    if($html =~ /data-src-hq/ism)
    {
      print "match\n";
    }
    else
    {
      print "nope\n";
    }

My code returns "nope". However, if I modify the pattern search to just "data" or "data-src" I do get a match. The same happens no matter how I use and combine the string and multiline modifier.

My understanding is that a hyphen is not a special character unless it's within brackets, am I missing something simple?

Caractacus
  • 83
  • 7
  • 2
    You're missing `else` – ctwheels Nov 26 '19 at 20:11
  • I don't know anything about Perl syntax but any other language would crash if you're missing the `else` like that – MonkeyZeus Nov 26 '19 at 20:12
  • 1
    Without the `else` that second block is just a bare block scope and runs the contained code unconditionally. – Grinnz Nov 26 '19 at 20:14
  • @MonkeyZeus it runs as a code block because it's in `{}`. In this case, it checks the `if` statement condition, runs the code block if it evaluates to true, then, runs the next code block (bare block as @Grinnz mentioned). It's equivalent to a loop that runs once. – ctwheels Nov 26 '19 at 20:14
  • @Grinnz So wouldn't OP see a possible `match` followed by a guaranteed `nope`? This of course assumes that OP didn't leave out any details from their testing... – MonkeyZeus Nov 26 '19 at 20:17
  • @MonkeyZeus that's exactly what I get when I set `$html = "data-src-hq";` – ctwheels Nov 26 '19 at 20:18
  • @Caractacus If you're just starting to learn regex then head over to https://regex101.com/ and test things out using the PCRE (Perl Compatible Regular Expressions) option. Whatever regex you write should be directly transferable to the right side of `=~`. If you get stuck then cliick on "Code Generator" and the site will show you a sample of Perl code. – MonkeyZeus Nov 26 '19 at 20:25
  • https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454 – Holli Nov 26 '19 at 20:44
  • Honestly that was a typo, fixed the OP – Caractacus Nov 27 '19 at 19:39
  • @MonkeyZeus Thanks for that site. It appears what I got from downloading the page versus viewing the source via the browser were not the exact same results. Hence the lack of matching. Am I able to close this question ? – Caractacus Dec 01 '19 at 04:48

1 Answers1

2

How to fix this?

You are likely getting two outputs, one of match and one of nope. Your code is missing the keyword else:

See your code's current execution here

if($html =~ /data-src-hq/ism)
{
  print "match\n";
}
{
  print "nope\n";
}

Should be:

See this code's execution here

if($html =~ /data-src-hq/ism)
{
  print "match\n";
}
else {
  print "nope\n";
}

Otherwise, your code is fine and works to identify whether data-src-hq exists in $html.


So why does your existing code output nope?

That's because {} is a basic block (see Basic BLOCKs in Perl's documentation). An excerpt from the documentation:

A BLOCK by itself (labeled or not) is semantically equivalent to a loop that executes once. Thus you can use any of the loop control statements in it to leave or restart the block. (Note that this is NOT true in eval{}, sub{}, or contrary to popular belief do{} blocks, which do NOT count as loops.) The continue block is optional.

ctwheels
  • 21,901
  • 9
  • 42
  • 77