How to Grab this using Regex?

Question

How to grab only name J.J. Abrams, Pippa Anderson on regex ???

 <header class="ipl-header">
        <div class="ipl-header__content">        
        <h4 name="producers" id="producers" class="ipl-header__content ipl-list-title">
            Produced by
        </h4>
</div>
        <a class="ipl-header__edit-link" href="https://contribute.imdb.com/updates?update=tt2527336:producers">Edit</a>
    </header>

    <table class="subpage_data spFirst crew_list">
        <tbody>
                    <tr class="even">
                        <td class="name">
                            <a href="/name/nm0009190/?ref_=tt_rv"
>J.J. Abrams</a>
                        </td>
                            <td>...</td>
                            <td>executive producer</td>
                    </tr>
                    <tr class="odd">
                        <td class="name">
                            <a href="/name/nm0027297/?ref_=tt_rv"
>Pippa Anderson</a>
                        </td>
                            <td>...</td>
                            <td>co-producer</td>
                    </tr>


                    </tbody>
                    </table>

il try using this code, but not working... please help me to fixed this. thanks

$arr['producers'] = $this->match_all_key_value('/<td class="name"><a.*?>(.*?)<\/a>/ms', $this->match('/Produced by<\/a><\/h4>(.*?)<\/table>/ms', $html, 1));
$arr['producers'] = array_slice($arr['producers'], 0, 5);

C Miller · Answer 1 · 2017-12-24T07:22:49.067

0

Here is one possible solution:

preg_match_all( "#<a href=\"/name/.*?>(.*?)</a>#is", $html, $results );
$arr['producers'] = array_pop( $results );
print_r( $arr['producers'] );

It's looking for links that have a reference starting with /name, then grabbing everything inside the link tabs. This is assuming there aren't any other links on the page that have references starting with /name in the path and not wanted. If so, you may have to tweak that part of the expression to be more specific.

edited Dec 24 '17 at 07:22

answered Dec 24 '17 at 06:24

C Miller

428
2
12

Im use to wordpress... so not need print_r( $producers ); – Adeeva Ameera Dec 24 '17 at 07:15
the print_r is just for testing purposes to see the contents of what was captured. It wouldn't be used in an actual PHP script. Did the rest help I hope? – C Miller Dec 24 '17 at 07:23

score 0 · Accepted Answer · answered Dec 24 '17 at 10:41

Parsing html is really a task for a dom parser like PHP Simple HTML DOM Parser or for example DOMDocument. This answer explains why.

If you want to do it in regex, another option (when running PHP 5.2.4 or later) could be using \K in your regex.

What you could do is match right before the the data that you are looking for. Then reset the starting point of the reported match using \K, match the data that you are looking for and using a positive lookahead for the closing anchor tag.

<td class="name">\n\s+<a[^>]+>\K.*(?=<\/a>)

$pattern = "/<td class=\"name\">\n\s+<a[^>]+>\K.*(?=<\/a>)/";
preg_match_all($pattern, $html, $matches);

The array would be then in $matches[0]

Explanation

Match <td class="name">
Match a new line \n
Match one or more white spaces \s+
Match <a
Match not a > on or more times [^>]+
Match >
Then reset the starting point of the reported match with \K
Match .* any character zero or more times
A positive lookahead (?=)
Asserting the what follows is </a> <\/a>
Close the positive lookahead )

Demo

Without \K, you could capture your values in a capturing group like (.*)

The regex would then look like:

<td class="name">\n\s+<a[^>]+>(.*)(?=<\/a>)

$pattern = "/<td class=\"name\">\n\s+<a[^>]+>(.*)(?=<\/a>)/";
preg_match_all($pattern, $html, $matches);

The array would be then in $matches[1]

Demo

How to Grab this using Regex?

2 Answers2