0

We are using a third part web app which does not allow or have an API yet, this third party app is basically a membership registration website and each member belongs to a specific category.

I need to use these category in our internal system and so far I have been manually adding the category in a drop down menu of a form as soon as a new category in created in third party app.

Since there is no access to an api so I am wondering if it is possible to crawl the third party app where the dropdown menu is and copy the entire dropdown menu over to our internal site.

I wish i can show you the efforts I have made so far but I am stuck on how to even begin this. I did however search online but all I could find is how to copy a dropdown on a same page.

Any push to the right direction will really be helpfull, the technologies I am working with is PHP and JS

Saadia
  • 856
  • 1
  • 12
  • 32

1 Answers1

1

I don't think CORS is going to help you here, as it's function is to provide a legal/safe way of sharing web resources across different domains (i.e. images/css files/web fonts), not data.

If there is no API for the data you need, you are almost certainly limited to scraping the data out of the web page. You can do this by first issuing a request for the page to obtain the html, then searching/parsing the html to find the drop-down menu, then finally parsing the menu items to obtain a list that you can use for your own drop-down.

So, some pointers:

Obtain page html - See PHP: how can I load the content of a web page into a variable?

Parse html - See PHP Parse HTML code

Of course how easy this ends up being depends on many factors, e.g.

  • Can you just request the page containing the drop-down, or does the web app need authentication? You may need to refine the curl request as appropriate.
  • Can you easily identify the html drop-down, e.g. using a unique id tag. If so, you could use DOMDocument::getElementById, otherwise you may need more complex logic to parse the page html and find the menu.

Either way, it should be possible to achieve - just remember that the third-party app is not under your control, and as such may be subject to changes that break your program.


LATEST UPDATE:

Added in retrieval of value, and we hide parse warnings using internal_errors.

Here's a simple PHP script that will print out the text and value of each of the drop-down options:

    <?php
    libxml_use_internal_errors(true);

    $html = file_get_contents('http://example.com/');
    $domdoc = new DomDocument;
    $domdoc->loadHTML($html);
    libxml_clear_errors();
    $menu = $domdoc->getElementById('tid');
    $options = $menu->childNodes;

    foreach ($options as $option) {
        echo($option->nodeValue)." - ".$option->getAttribute('value')."<br>";
    }

    ?>
Community
  • 1
  • 1
Raad
  • 4,540
  • 2
  • 24
  • 41
  • wow this is great...... would it at all be possible to get the value as well, for example the above code does give me the name of dropdown but not its value. I am looking for value 24 and name Business Consulting` ` – Saadia Sep 01 '15 at 11:24