0
while($row = mysql_fetch_row($result)){
        preg_match('#<span id="lblNumerZgloszenia" style="font-weight:bold;font-style:italic;">([^<]*)<\/span>#',$row[1],$matches);
        $query2 = 'UPDATE content_pl SET kategoria_data='.$matches[1].' WHERE id='.$row[0].';';
        mysql_query($query2);
    }

I'm doing this preg_match to get the span contents into $matches array. When I do a print_r($matches), it shows the right results but when I use $matches[1], it browser tells me that there is no such index.

EDIT: print_r shows

[...]Array ( [0] => TOW:   (210) 252250, (220) 01-07-2002 [1] => TOW:   (210) 252250, (220) 01-07-2002 ) Array ( [0] => TOW:   (210) 252251, (220) 01-07-2002 [1] => TOW:   (210) 252251, (220) 01-07-2002 ) Array ( [0] => TOW:   (210) 252252, (220) 01-07-2002 [1] => TOW:   (210) 252252, (220) 01-07-2002 ) Array ( [0] => TOW:   (210) 252253, (220) 01-07-2002 [1] => TOW:   (210) 252253, (220) 01-07-2002 )[...]
Sky
  • 99
  • 1
  • 11

2 Answers2

3

You're doing this in a while loop which means it's likely happening more than once. If you just print_r($matches); exit; you might notice that you get what you're expecting, but that's just one of the iterations of your loop.

Most likely, there is at least one case where you do not find any matches. You should wrap your second mysql_query (which is deprecated, BTW - you might want to switch to PDO if your project is small) with an if statement that checks the return value of your preg_match call. Only run the query if preg_match returns > 0

Colin M
  • 13,010
  • 3
  • 38
  • 58
  • Actually, all my fields are in a database and there is no null field. (There is a pure html field). and I've got all my matches in my array. I don't think it's the mistake. – Sky Dec 09 '12 at 17:12
  • Right, but what I'm saying is that - are you sure EVERY single row that is being fetched by your query contains the search string in question? I can answer that for you: it doesn't. There is a row somewhere that does not contain the string. If you wrap your query in the if statement that I mentioned, you'll see that you don't get the error anymore. That means that `preg_match` failed to find a match on at least one row. – Colin M Dec 09 '12 at 17:14
  • I'm trying with the if statement but anyway, m.buettner's library will help me doing it faster. Thanks for your help. Edit: Still doesn't work with the if statement, it acts like the $matches was empty and doesn't do any query. – Sky Dec 09 '12 at 17:16
  • @Brut4lity The if statement wasn't intended to fix your problem, it was to show you that there was a problem. That is my point. If the query is not being run, `preg_match` is not finding a result in one of your rows. That's an issue with your data or with your regular expression, but not an issue with PHP. If you could possibly show us the row that doesn't match, we may be able to better help. – Colin M Dec 09 '12 at 17:23
  • Actually when I count the $matches and the rows, they're perfectly equals. That lets me think `preg_match` found a result in each row (logically). I also tried with m.buettner's library and my query did not work aswell (too much execution time). – Sky Dec 09 '12 at 17:37
  • Okay, I think you're misunderstanding. Here's what I'm getting at. Put this at the top of your while loop: `if (/* your preg_match here*/ == false) { var_dump($row); exit; }` and see if you get a row printed to the screen when you run that – Colin M Dec 09 '12 at 18:06
  • Oh right, this field at number 39006 is empty. That's why it wouldn't work. Guess you're right ;) ! But then, why when I enclosed my query in the if(preg_match) it wouldn't work aswell ? Is there another mistake ? – Sky Dec 09 '12 at 18:32
  • Actually I found my error... This was a VERY simple error. => I couldn't add my results because I was not putting quotes for strings in database... Which leads to no error on my PHP and nothing appeared on SQL aswell. But it couldn't accept my results. Thanks a lot of your help ! – Sky Dec 09 '12 at 18:57
2

Let me show you a better approach than parsing HTML with regex. Here is convenient library to do the parsing for you. The code becomes really simple (and readable) with this:

$html = new simple_html_dom();
$html->load($row[1]);

$span = $html->find('span[id=lblNumerZgloszenia]', 0);
$data = $span->innertext;

$query2 = 'UPDATE content_pl SET kategoria_data='.$data.' WHERE id='.$row[0].';';

If you cannot use a 3rd-party library for some reason, you can do something similar with the built-in DOM module. It will not be quite as elegant but still much more robust and readable.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • That's a bit unnecessary if all the OP needs to do is strip one element out of some text. Libraries like that tend to be a large performance hit for not much functionality gain in a simple search scenario. This also doesn't really answer the OP's question. And, the document may not even be valid HTML. – Colin M Dec 09 '12 at 17:10
  • @ColinMorelli I believe the reason why it doesn't work in some case might also be that the second attribute isn't exactly the same, or that there are nested `<` inside the span tag. All reasons not to use regex. – Martin Ender Dec 09 '12 at 17:12
  • Thanks, I'm gonna use it, it might be very helpful, I'm kinda newbie with regexps @Colin: Actually, it's a local script, so I'll just erase once my database is cleaned. – Sky Dec 09 '12 at 17:13
  • @m.buettner True, but you can also solve this with additional regexes as long as you have at least one constant that you can count on. I just don't like the thought of including a massive DOM parser for a simple string search. Seems like overkill to me. – Colin M Dec 09 '12 at 17:23
  • @ColinMorelli attempting to parse HTML with regex is just wrong. And highly unstable. And completely unmaintainable once you try to catch every evenuatliy. If the HTML is invalid, regexes are even more likely to be screwed up than a DOM parser. – Martin Ender Dec 09 '12 at 17:36
  • @m.buettner : How long does it take to parse a lot fields with a lot of html inside it ? Because my query was taking less than 30 seconds and it can't finish with this library. I think I won't be able to use it for my project. – Sky Dec 09 '12 at 17:47
  • @Brut4lity did you try on a smaller subset to make sure it's really just slow? obviously, I can't predict what the performance trade-off on your data would be ;) – Martin Ender Dec 09 '12 at 17:50
  • @m.buettner : It's really just slow. I launched the query 10 min ago, still working. I put a 30min limit for a query, hope it won't fail. Thanks for your help anyway. – Sky Dec 09 '12 at 17:56
  • @m.buettner That's where you and I disagree. He is not trying to *parse* the HTML. He's trying to get one value out of what is effectively a string. Using regex makes sense for the same reason you wouldn't use `preg_replace` on a single word when `str_replace` would do the job much more effectively. There are ALWAYS performance/feature tradeoffs. This, IMO, is not worth the performance hit. In any case, that's as much as I'll discuss via comments. – Colin M Dec 09 '12 at 18:03