0

Im using Postgres 9.3 on MacOSX.

I would be very happy if anyone could point me in the right direction here. I would like to write a function which connects to an existing (online) database (e.g. this one) and retrieves data (in this case shapefiles) using an input file with appropriate strings (in this case MRGIDs). Im sorry I don't have any code, I literally don't know where to start and I don't seem to find any threads on it. Maybe SQL isn't the way to go here?

Input file example;

species,mrgids
Sp1,4279
Sp1,8366
Sp1,21899
Sp1,21834
Sp1,7130
Sp1,1905
Sp1,21900
Sp1,5679
Sp1,5696

Thanks!

jO.
  • 3,384
  • 7
  • 28
  • 38
  • 3
    oh, that site uses a restful api, too... use that... `http://www.marineregions.org/rest/getGazetteerRecordByMRGID.json/5696/` where `5696` is the MRGID. [see their documentation](http://www.marineregions.org/gazetteer.php?p=webservices), they show a lot of very useful examples if you go through all of it. – gloomy.penguin Dec 04 '13 at 08:16

2 Answers2

2

This is almost certainly done best outside the database, using a script in your choice of language. I'd use Python and psycopg2, but things like Ruby + the Pg gem, Perl and DBI / DBD::Pg, or even PHP and PDO, are equally reasonable choices.

Your code can do an HTTP fetch, then (if it's CSV-like) use the COPY command to load the data into PostgreSQL. If it's a shapefile, you can feed the data to PostGIS's shp2pgsql loader, or make individual INSERTs using the GeomFromText PostGIS function.

You could do the HTTP fetch from a PL/Pythonu or PL/Perlu stored procedure, but there'd be no real benefit over just doing it from a script, and it'd be more robust as an external script.

So, really, you need to split this problem up.

You'll need code to talk to the website(s) of interest, possibly including things like HTTP POSTs to submit forms. Or, preferably, use a web API for the site(s) that's designed for automatic scripted interaction. Most simple RESTful APIs are easy to use from scripting languages using libraries like Perl's LWP, Python's httplib, etc. In the case of the site you linked to, as user623952 mentioned, there's a RESTful API.

Then you'll need code to fetch the data of interest, and code to read the fetched data and load it into PostgreSQL. You might want to download all the data then load it, or you may want to stream it into the database as it's downloaded (pipe to shp2pgsql, etc).

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • @Jo It has a bit of a learning curve, but PostGIS will be a lifesaver if you're working with geographical data. Strongly recommended. – Craig Ringer Dec 04 '13 at 08:38
  • Cheers! Will probably go nice together with the different map libraries in R too. – jO. Dec 04 '13 at 08:44
1

this a very basic example with with PHP and CURL

I used your input file exactly and saved it as input.txt

species,mrgids
Sp1,4279
Sp1,8366
Sp1,21899
Sp1,21834
Sp1,7130
Sp1,1905
Sp1,21900
Sp1,5679
Sp1,5696

and this is the PHP and CURL doing its stuff:

<?php 
$base_url = "http://www.marineregions.org/rest/getGazetteerRecordByMRGID.json/%s/";

// just get the input file into an array to use 
$csv = read_file("input.txt"); 

// if you want to see the format of $csv    
print "<pre>".print_r($csv,true)."</pre>";


// go through each csv item and run a curl request on it 
foreach($csv as $i => $data)
{
    $mrgids = $data['mrgids']; 

    $url = sprintf($base_url,$mrgids);

    $response = run_curl_request($url);  

    if ($response!==false)
    {
        //http://us2.php.net/manual/en/function.json-decode.php
        $json = json_decode($response,true); 

        if (!is_null($json))
        {
            // this is where you would write the code to stick this info in 
            // your DB or do whatever you want with it...
            print "<pre>$url \n".print_r($json,true)."\n\n</pre>";
        }
        else 
        {
            print "error:  response was not proper JSON for $url <br/><br/>";
            print $response."<br/><br/><br/>"; 
        }
    }
    else
    {
        print "error:  response was false for $url <br/><br/>";
    }

}


function read_file($filename, $has_headers=true, $assoc=true) 
{
    $headers = array(); 
    $row = 1;

    if (($handle = fopen($filename, "r")) !== FALSE) 
    {
        $return = array(); 

        if ($has_headers) 
        {
            if (($data = fgetcsv($handle, 1000, ",")) !==false)
            {
                $headers = $data; 
            }
        }
        while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
        {
            if ($assoc) 
            {
                $temp = array(); 
                foreach($headers as $hi => $header)
                {
                    $temp[$header] = (isset($data[$hi])) ? $data[$hi] : ''; 
                }
                $return[] = $temp; 
            }
            else 
            {
                $return[] = $data; 
            }
        }   
        fclose($handle);
    }
    else
    {
        $return = false; 
    }

    return $return; 
}


// requires PHP CURL extension 
// http://php.net/manual/en/function.curl-setopt.php
function run_curl_request($url)
{ 
    // create curl resource 
    $ch = curl_init();  

    $defaults = array( 
        CURLOPT_POST => false, 
        CURLOPT_HEADER => false, 
        CURLOPT_URL => $url, 
        CURLOPT_FRESH_CONNECT => true, 
          CURLOPT_FAILONERROR => true,  
        CURLOPT_RETURNTRANSFER => true, 
        CURLOPT_FORBID_REUSE => true, 
        CURLOPT_TIMEOUT => 4 
    ); 

   curl_setopt_array($ch, $defaults); 


    // $output contains the output string 
    $output = curl_exec($ch); 

    // close curl resource to free up system resources 
    curl_close($ch);  

    return $output; 
}


?>

And if it worked, you get a bunch of these as output:

http://www.marineregions.org/rest/getGazetteerRecordByMRGID.json/4279/ 
Array
(
    [MRGID] => 4279
    [gazetteerSource] => IHO 23-3rd: Limits of Oceans and Seas, Special Publication 23, 3rd Edition 1953, published by the International Hydrographic Organization.
    [placeType] => IHO Sea Area
    [latitude] => 39.749996185303
    [longitude] => 5.0942182540894
    [minLatitude] => 35.071937561035
    [minLongitude] => -6.0326728820801
    [maxLatitude] => 44.42805480957
    [maxLongitude] => 16.221109390259
    [precision] => 1079464.0796258
    [preferredGazetteerName] => Mediterranean Sea - Western Basin
    [preferredGazetteerNameLang] => English
    [status] => standard
    [accepted] => 4279
)

notes:

Community
  • 1
  • 1
gloomy.penguin
  • 5,833
  • 6
  • 33
  • 59
  • Oh... okay... so what are you doing with this data? sticking it in your DB? make sure to use `pdo` or `mysqli`... and look for posts like this: [PHP PDO batch insert from multi dimensional array failing to insert into MySQL](http://stackoverflow.com/questions/13507496/).. anything with inserting with an associative array. instead of outputting the array to the screen, that's where your insert should take place. to make it easy on yourself, make the column names the same as their field names. – gloomy.penguin Dec 04 '13 at 09:32
  • Yeah, guess I would use PostGIS and import the data into my Postgres database. Then the plan is to use R for geo-statistics. Im familiar with R but I found [a good thread](http://gis.stackexchange.com/questions/64950/which-is-the-best-way-of-working-with-postgis-data-in-r) on the workflow from Postgres to R – jO. Dec 04 '13 at 09:37
  • proooobably lol... i had fun writing that example anyway.....! I don't know R or PostGIS. No matter what, though, use that RESTful API the site has. It will make your life a lot easier. The response is JSON, so you should probably look into that, too. – gloomy.penguin Dec 04 '13 at 09:38
  • The data retrieved in Json format, does that contain the same information as a shapefile of the same region? Im a noob here, so Im not sure how to continue from having this data, potentially put back in my database and used for geostats? Any idea? Cheers – jO. Dec 06 '13 at 00:27