1

I'm Having trouble with regex. Never fully understood it the real question is: does anybody have a good site that explains the difference between the expression instead of just posting stuff like

$regexp = "/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[@][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/";

then prattling off what that line as a whole will do. Rather then what each expression will do. I've tried googling many different versions of preg_replace and regex tutorial but they all seem to assume that we already know what stuff like \b[^>]* will do.

Secondary. The reason i am trying to do this: i want to turn

<span style="color: #000000">*ANY NUMBER*</span>

into

<span style="color: #0000ff">*ANY NUMBER*</span>

a few variations that i have already tried some just didnt work some make the script crap out.

$data = preg_replace("/<span style=\"color: #000000\">([0-9])</span>/", "<span style=\"color: #FFCC00\">$1</span>", $data);//just tried to match atleast 0-9

$data = preg_replace("/<span style=\"color: #000000\"\b[^>]*>(.*?)</span>/", "<span style=\"color: #FFCC00\">$1</span>", $data);

$data = preg_replace("/<span style=\"color: #000000\"\b[^>]*>([0-9])</span>/", "<span style=\"color: #FFCC00\">$1</span>", $data);

The answer to this specific problem is not nearly as important to me as a site so check goes to that. Tried alot of different sites and i am pretty sure its not above my comprehension i just cannot find a good for all the bad tutorial/example farm. Normal fallbacks of w3 and phpdotnet dont have what i need this time.

EDIT1 For those of you who end up in here looking for a similar answer:

$data = preg_replace("/<span style=\"color: #000000\">([0-9]{1,})<\/span>/", "<span style=\"color: #FFCC00\">$1</span>", $data);

Did what it needed to. Sadly it was one of the first things i tried but because i didnt put </span> instead of it was not working and i do not know if "[0-9]{1,}" is the MOST appropriate way of matching any number (telling it to match any integer 0-9 with [0-9] atleast once and as many times as it can with {1,} it still fit the purpose)

ROY Finley Posted: http://www.macronimous.com/resources/writing_regular_expression_with_php.asp Its a good site with a list of expression definitions and a good example workup below.

Also: regular-expressions.info/tutorial.html was posted a few times. Its a slower more indepth walk through but if you are stuck like i am its good.

Will pop in about regex101 and the parsers after i have a chance to play with them.

EDIT2 DWright posted a book link below "Mastering Regular Expressions". If you look at regex and cannot make heads or tails of the convolution of characters it is DEFINITELY worth picking it up. Took about an hour and a half to read about half but that is no time compared to the hours spend on google and the mess up work arounds used to avoid it.

Also the html parse linked below would be right for this particular problem.

Noname Provided
  • 215
  • 2
  • 12
  • 2
    Have you tried http://www.regular-expressions.info/tutorialcnt.html – RonaldBarzell Dec 21 '12 at 23:55
  • 1
    http://regexpal.com/ to there and try your regex live... – veritas Dec 21 '12 at 23:56
  • @veritas it should be noted that regexpal uses JavaScript's regex engine which has a few severe limitations – Martin Ender Dec 21 '12 at 23:57
  • @veritas: Good point. The OP should also experiment with a variety of regular expressions; that will be most instructive. – RonaldBarzell Dec 21 '12 at 23:58
  • Checking out the listed sites as we speak gunna need a minute – Noname Provided Dec 21 '12 at 23:58
  • **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. – Andy Lester Dec 22 '12 at 22:37
  • Who told you to use a regex with `[A-z]` in it? That's a pretty common error; intended to match an uppercase or lowercase ASCII letter, it also matches the characters (`[`, `]`, `^`, backtick and backslash), whose code points happen to lie between `Z` and `a`. – Alan Moore Dec 23 '12 at 01:15
  • @Andy Lester i am looking into html parsing now. And the html elements i am trying to effect are generated by me. So if they html changes away from my expectations in this case i have bigger problems. – Noname Provided Dec 23 '12 at 06:05
  • @Alan Moore That first snippet i posted was copy pasted out of regular-expressions.info/examples.html for the purposes of showing why i was having trouble finding what i needed with google – Noname Provided Dec 23 '12 at 06:05
  • So it's just an example of the kind of junk answers you're talking about? That's a relief! I usually don't bother trying to correct email regexes, but `[A-z]` always catches my eye. – Alan Moore Dec 23 '12 at 12:59

3 Answers3

6

To have a regex explained, you can have a look at Regex101. To actually learn regular expressions (which I recommend), this is a pretty good, in-depth tutorial. After you have read that, the PCRE documentation on PHP.net shouldn't seem to arcane any more, and reading it will help you get your head around some specific differences for PHP.

However, for the problem at hand, you shouldn't actually be using regex at all. A DOM parser is the way to go. Here is a very convenient to use 3rd party one, and this is what PHP brings along itself. As mentioned by hakre,here is a more extensive list of libraries available for this purpose.

Another general recommendation for regexes in PHP: use single quotes '/pattern/', because double quotes cause a lot of trouble with escape sequences (you need to double some backslashes otherwise).

Finally, the reason you get errors is that your regex delimiter (you use /) shows up in your pattern (in the closing span tag) without it being escaped. That means the engine thinks that the pattern ends at the first / and that span>/ are 6 different modifiers (most of which don't actually exist). You could either escape the delimiter like <\/span> or even better, change the delimiter (you can use pretty much anything) like '~yourPattern/Here~'.

Edit: Since I posted this answer, two new websites have been released which try to explain regular expressions by visualising them. Right now they only support the (quite limited) JavaScript flavor, but it's a good point to start:

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • Ty for the reply i'd ended up on that regular-expressions.info before but apparently was looking at their examples.html and dismissed them too quickly. Also the Regex101 will be a huge help havent played with it yet but being able to test the different variations will help out. I will check into the DOM parser didnt read too much into it right now, was skipping around reading the regex links, but it does look like it is made to do exactly what i am trying in this instance. Also good call on the \/.. spammed ctrl z. Turns out it was the second rendition that i tried would have worked. – Noname Provided Dec 22 '12 at 00:15
  • Ran out of room. Will comment back later on if that that dom parser was able to fill this role and update the question with the links/info – Noname Provided Dec 22 '12 at 00:18
  • The part about regex was fine, the third party HTML parser you suggest is not one I would suggest. Better link to some reference for more choice: http://stackoverflow.com/a/3577662/367456 @NonameProvided: You will find a lot quersitons here on site about `DomDocument` and `Simplexml` PHP DOM parsers. I suggest you start with [`DomDocument`](http://php.net/DomDocument) – hakre Dec 22 '12 at 00:20
  • @hakre thanks, didn't know that question. I will use that in the future. I find that SimpleHtmlDom does the trick for most regex vs HTML questions here in SO, and it's usually really easy to integrate. that's why I linked it. – Martin Ender Dec 22 '12 at 00:22
1

http://www.macronimous.com/resources/writing_regular_expression_with_php.asp

look at this one. it seems to cover the process pretty good.

ROY Finley
  • 1,406
  • 1
  • 9
  • 18
  • TY for the reply i hadnt seen that site before it does help. It falls right into what i am used to/ like list of commands at start then getting more complex as it drops further down. – Noname Provided Dec 22 '12 at 00:13
1

Try this website, perhaps. Personally, I'd say if you are really interested in regexes, it'd be worth getting a book like this one.

DWright
  • 9,258
  • 4
  • 36
  • 53
  • Also, the reason your last 2 tries aren't working, I think, is because you have a greedy * which is going to match as much as possible and go further into the string being matched than you want it to. If you throw a ? after `*`, it will make the match non-greedy and the * will match the minimum it needs to match. However the minimum is 0. I suspect you want to replace * with +?, which will match at least one character at this spot. However, try this instead: `preg_replace("/$1", $data);` That (\d+) does digits – DWright Dec 22 '12 at 00:08
  • Ty for the reply i'd ended up on that website before but apparently was looking at their examples.html and dismissed them too quickly. And its not that i am interested in regexes its that i dread them. But for as much trouble as they have given me it would be worth the investment on the book will check next time i am at b&n. – Noname Provided Dec 22 '12 at 00:11
  • So i grabbed that book off of itunes. I ended up skipped from chap 5 to 9 for the sake of time but its definitely good. I'd like to thank you for recommending it i do not normally buy ref blindly and would have continued to try to muddle through with google. I'll say it again though Thank You. For what ive gone through in there so far not only can i understand all those examples that left me blank faced before i plan to revisit a few old projects. – Noname Provided Dec 23 '12 at 06:17
  • So glad this has proved helpful. I still value my copy of 10 or 12 years ago and now it's in the 3rd edition! – DWright Dec 23 '12 at 06:19