0

I was practicing CURL'ing to get US congressional district information from this site:

https://ziplook.house.gov/zip/ziplook.html

I checked the field names in the inspector and they appear correct.

If I go to the site and use some sample zipcodes there (as a baseline reference) I get a different answer than when using my code.

For example, if you go to the site and use just 77357 you'll get 3 US Congressman returned. If you narrow it down with the zip4 value of 3016 added you should only get 1 name.

When I run it with 77357 and 3016, I keep getting all three Congressman rather than just the one (a kevin brady).

It's as if the postfields of $fields_string isn't considering the zip4 value in the $fields array.

And while the page produces names of the folks, I get an error log (6 times) from line 166 (which doesn't exist) pointing to line 131.

This is the error: Trying to get property 'nodeValue' of non-object in This is the line of code: if($cols->item(0)->nodeValue !=''){

$url='https://ziplook.house.gov/htbin/ziplook_find';



$userzip = "77357";//the 5-digit zip
$userzip4 = "3016"; //the 4-digit code
$user_thru = $userzip4; //4-digit range end can just be the same as zip4.


$curl = curl_init();
$fields = array(     //the three fields online. 
    'zip' => $userzip,
    'zip4' => $userzip4,
    'thru_zip4' => $user_thru,
);
 
   

$fields_string = http_build_query($fields);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, TRUE);
curl_setopt($curl, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($curl,CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($curl);
curl_close($curl);

$cleanhtml=preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $data);

$dom = new DOMDocument();
$dom->loadHTML($cleanhtml);
$dom->preserveWhiteSpace = false; 

$tables = $dom->getElementsByTagName('table'); 
$rows = $tables->item(0)->getElementsByTagName('tr'); 
$tds = $tables->item(0)->getElementsByTagName('td'); 
$link='';

foreach ($rows as $row) {
    $cols = $row->getElementsByTagName('td'); 
    $links = $row->getElementsByTagName('a');
    foreach ($links as $link){
        //echo $link->getAttribute('href')."<br>";
        $link= $link->getAttribute('href');
    }



    if($cols->item(0)->nodeValue !=''){

        echo 'ZIP CODE: '.$cols->item(0)->nodeValue.'<br />'; 
        echo 'MEMBER: '.$cols->item(1)->nodeValue.'<br />'; 
        echo 'MEMBER-LINK: '.$link.'<br />'; 
        echo 'STATE/DISTRICT: '.$cols->item(2)->nodeValue.'<br />'; 
        echo 'ROOM: '.$cols->item(3)->nodeValue.'<br />'; 
        echo 'PHONE: '.$cols->item(4)->nodeValue.'<br />'; 
        echo '<hr />'; 
    }

}
Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • Either there is no such node or it has a different format – Your Common Sense Jul 16 '22 at 07:10
  • The CURL code you have shown here causes an error when I tested it because it does not return any content. The endpoint is `https` yet you have no configuration parameters to handle SSL connections. When this is configured properly it is straightforward to find the elements of interest using XPath – Professor Abronsius Jul 16 '22 at 07:50
  • Thanks. First, I had help on fiverr with this (low experience/ short on time). I'm learning more now. It's being run on a SMF forum (simple machines) and turns out if include_once on the SSI.php (to validate a user ID to use the page) I get ALL the congressman. If I don't include the SSI, it starts to work as it should, though I still get the node errors. I'm starting to understand the process and I think should start over. Someone even said it should be a GET and not POST though I'm not sure. sorting/stowing the results is more intimidating now than doing the actual CURL. –  Jul 16 '22 at 23:16
  • Actually, the curl works fine. I had a nodevalue error i fixed, but it's working great. It's just not compatible with something inside the SSI.php that comes with my forum software (which is being used to validate a user id before loading the page). At least I know it works fine on a blank .php page. I'll make a better question next time. –  Jul 18 '22 at 20:43

0 Answers0