70

I'm having a problem trying to ignore whitespace in-between certain characters. I've been Googling around for a few days and can't seem to find the right solution.

Here's my code:

// Get Image data
preg_match('#<a href="(.*?)" title="(.*?)"><img alt="(.*?)" src="(.*?)"[\s*]width="150"[\s*]height="(.*?)"></a>#', $data, $imagematch);
$image = $imagematch[4];

Basically these are some of the scenarios I have:

 <a href="/wiki/File:Sky1.png" title="File:Sky1.png"><img alt="Sky1.png" src="http://media-mcw.cursecdn.com/thumb/5/56/Sky1.png/150px-Sky1.png"width="150" height="84"></a>

(Notice the lack of a space between width="" and src="")

And

<a href="/wiki/File:TallGrass.gif" title="File:TallGrass.gif"><img alt="TallGrass.gif" src="http://media-mcw.cursecdn.com/3/34/TallGrass.gif" width="150"height="150"></a>

(Notice the lack of a space in between width="" and height="".)

Is there anyway to ignore the whitespace in between those characters? As I am not a Regex expert.

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
jameslfc19
  • 1,084
  • 1
  • 10
  • 14

1 Answers1

161

Add a \s? if a space can be allowed.

\s stands for white space

? says the preceding character may occur once or not occur.

If more than one spaces are allowed and is optional, use \s*.

* says preceding character can occur zero or more times.

'#<a href\s?="(.*?)" title\s?="(.*?)"><img alt\s?="(.*?)" src\s?="(.*?)"[\s*]width\s?="150"[\s*]height\s?="(.*?)"></a>#'

allows an optional space between attribute name and =.

If you want an optional space after the = also, add a \s? after it also.

Likewise, wherever you have optional characters, you can use ? if the maximum occurrence is 1 or * if the maximum occurrence is unlimited, following the optional character.

And your actual problem was [\s*] which causes occurrence of a whitespace or a * as characters enclosed in [ and ] is a character class. A character class allows occurrence of any of its members once (so remove * from it) and if you append a quantifier (?, +, * etc) after the ] any character(s) in the character class can occur according to the quantifier.

Naveed S
  • 5,106
  • 4
  • 34
  • 52
  • Thanks! I changed [\s*] to \s? and it works now! :) Thank you! – jameslfc19 Jan 12 '13 at 12:02
  • 6
    @jameslfc19 `\s?` means 0 or 1 whitespace characters. However, what if there are more than 1 whitespace characters? You want `\s*` so it will match 0 or **more**. Btw you do not want to use regex to parse HTML. You want to use one of [these](http://stackoverflow.com/q/3577641/1592648) methods. – kittycat Jan 12 '13 at 12:20
  • @naveed-s I'm having an issue with trailing space in named capturing but couldn't make it working can you please guide me on what I'm missing? [Link to RegExp](https://regex101.com/r/WotpaP/1) The word "contact" must include in the match searchTerm that's what I'm trying to achieve. – HenonoaH Mar 22 '21 at 18:23