0

I have a programming problem that I need help solving, and that I'm hoping I can get some assistance with here. Basically, I need to find a way to merge array data that's being returned from 2 different RESTful APIs, sort it, and then paginate through it.

Note that I'm dealing with MLS data (ie: Real Estate listings), and that I'm using PHP for this project (exclusively). Also, these are the 2 different APIs that I'm using:

Spark API https://sparkplatform.com/docs/overview/api

Bridge RESO Web API https://bridgedataoutput.com/docs/platform/

The problem that I'm having, specifically, is that each of these APIs have different limits as to how many records can be retrieved per request. For example, the Spark API allows me to retrieve as many as 1000 records at a time, whereas the Bridge API only allows me to retrieve no more than 200. I understand why these limits are in place, and it really hasn't been a problem until now. I say this because I've been asked to try to find a way to retrieve Real Estate listings from both of these APIs, to then merge all of the returned data from both of them into a single array, .. and to then sort them based on list price (from highest to lowest price), .. and then paginate through them (50 listings per page).

This wouldn't be a problem if I was dealing with just one of the 2 different API, as both of them have features that make it quite easy to both sort and paginate through the data. And if I was working with queries that retrieve only small amounts of data (ie: less than 50 records total from both APIs combined), I've already implemented working methods that allow for me to merge the 2 different data sets and then sort them based on list price:

$mlsdata = array_merge($mlsdatamiami,$mlsdataftlauderdale);

function price_compare($a, $b) {
    $t2 = $a['StandardFields']['ListPrice'];
    $t1 = $b['StandardFields']['ListPrice'];
    return $t1 - $t2;
}   

usort($mlsdata, 'price_compare');

However, I am unfortunately NOT dealing with small data sets, and could potentially be returning as many as tens of thousands of listings from both APIs combined.

Although I've succeeded at writing some code that allows for me to paginate through my new "merged" data set, this obviously only works when I'm dealing with small amounts of data (less than 1200 records).

$finalarray = array_slice($newarray,$startposition,$perpagelimit);

foreach($finalarray as $item) {
    ...
}

But again, I'm not dealing with result sets of less than 1200 records. So although it might appear that I'm displaying listings from highest price to lowest price on page #1 of a paginated result set, everything starts to fall apart on page #2, where the list prices are suddenly all over the place.

I've tried running multiple different queries in a for loop, pushing the data to a new array and then merging it with the initial result set ..

$miamimlsdataraw = file_get_contents($apiurl);

$miamimlsdata = json_decode($miamimlsdataraw, true);

$number_of_miami_listings = $miamimlsdata['@odata.count'];                  

$miamilistingsarray = array();

if ($miamimlsdata['@odata.count'] > 200) {
    
    $number_of_miami_queries = floor($number_of_miami_listings / 200);
    
    $miami_listings_start_number = 200;
    
    for ($x = 1; $x <= $number_of_miami_queries; $x++) {                                                    
        $paramsextra = $params . "&\$skip=" . $miami_listings_start_number * $x;
        
        $apiurl = $baseurl . '/' . $dataset . '/' . $endpoint . '?access_token=' . $accesstoken . $paramsextra;         
        
        $miamimlsdataraw = file_get_contents($apiurl);

        $miamimlsdata_extra = json_decode($miamimlsdataraw, true);

        array_push($miamilistingsarray,$miamimlsdata_extra);
      
    }                       
    
}

$miamimlsdata = array_merge($miamilistingsarray,$miamimlsdata);

With this particular experiment, I was only dealing with about 2,700 listings (from only 1 of the APIs) .. and the performance was horrendous. And when I tried writing all of the returned data to a text file on the server (rather then trying to display it in the page), it came in at a whopping 25mb in size. Needless to say, I don't think that I can reliably use this approach at all.

I've considered perhaps setting this up as a cronjob, storing the array data in our database (the site is WordPress based), and then retrieving and paginating through it at runtime .. rather than querying the APIs in realtime. But I now strongly suspect that this would be just as inefficient.

So .. I realize that this question was rather long winded, but I honestly didn't know where else to turn. Is what I'm trying to do simply not possible? Or am I perhaps missing something obvious? I welcome all and any suggestions.

-- Yvan

Yvan Gagnon
  • 21
  • 1
  • 8
  • It sounds to me you need to locally store the result of the API queries. The most obvious solution would be a database. So you retrieve listings with the API's, store them in the database, and then present them on your website by retrieving them from the database. The only thing left to do then is to regularly update the listings in the database. – KIKO Software Oct 12 '22 at 18:26
  • If you're suggesting that we write the individual pieces of data to their own columns/rows within a relational database, this would be out of the question (too cumbersome to maintain considering the potential tens of thousands of records). If we could somehow store just a single giant object/array in the database, however, I think it'd be so much easier to maintain (we could just overwrite the entire object each time). I'm just concerned about the size of the data, and wondering how it might affect the execution/load times. Also I'm guessing searching through this data would be a nightmare. – Yvan Gagnon Oct 13 '22 at 16:26
  • This must be the first time I was told that a database was to cumbersome to maintain because of the massive amount of records. It is the specifically purpose of a database to deal with that efficiently. But if you don't like a database you could simply store all the data in a file. Again, I must say that it sounds weird. Too many records for a database, but not to many to show in a browser? Perhaps you should put your prejudice aside and learn about database after all. Most web apps now-a-days are based on a database. This one is screaming for one. – KIKO Software Oct 13 '22 at 16:52
  • You are missing the point. I've been working with databases for 25+ years and understand quite well how they work and what they can handle. We are interfacing with remote APIs for a reason .. which is to avoid having to store and maintain the data in our own database, and to keep the data as "real time" as possible (MLS data changes very frequently, and actually HAS been very difficult to manually maintain, which is why automation is preferred). This database storage option has come up only recently as an absolute last resort, due to the specific challenges that I've explained. – Yvan Gagnon Oct 13 '22 at 18:15

1 Answers1

0

There is no need to merge and sort all the listings of both MLS. Since you only need 50 listings for each page and both API's are using RESO, you can have the API return only sorted results that you need. For example, to get listings for page 1, you only need:

https://api.bridgedataoutput.com/api/v2/OData/Property?$orderby=ListPrice desc&$top=50

By looping through both arrays simultaneously in a while loop, you can pick out and stop as soon as you get 50 of the top highest price listings from both arrays.

qraft
  • 1
  • 2