8

I am trying to post several parameters to this [url][1] and press 'submit' to download a csv file generated.

I think 5 steps are needed at least.

questionhang
  • 735
  • 2
  • 12
  • 31
  • 1
    By editing your question you have now made previous answers invalid. – Reti43 Oct 18 '17 at 08:21
  • Can you show me one or two lines of you expected results? I think to get what you want may also be possible using `requests`. – SIM Oct 18 '17 at 12:59

3 Answers3

1

Unfortunately, I don't think you're going to be able to do this via requests. As far as I can tell, there is no POST being made when you click "Submit". It appears as though all the data is being generated by JavaScript, which requests can't deal with.

You could try using something like Selenium to automate a browser (which can handle the JS) and then scrape data from there.

SuperStew
  • 2,857
  • 2
  • 15
  • 27
0

Try this. You need to process the rest according to your need. Here is the gist part. It produces below results:

import requests 

url = "http://nxsa.esac.esa.int/nxsa-sl/servlet/observations-metadata?RESOURCE_CLASS=OBSERVATION&ADQLQUERY=SELECT%20DISTINCT%20OBSERVATION.OBSERVATION_OID,OBSERVATION.MOVING_TARGET,OBSERVATION.OBSERVATION_ID,EPIC_OBSERVATION_IMAGE.ICON,EPIC_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_OBSERVATION_IMAGE.ICON,RGS_FLUXED_OBSERVATION_IMAGE.ICON_PREVIEW,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,OM_OBSERVATION_IMAGE.ICON_PREVIEW_V,OM_OBSERVATION_IMAGE.ICON_PREVIEW_B,OM_OBSERVATION_IMAGE.ICON_PREVIEW_L,OM_OBSERVATION_IMAGE.ICON_PREVIEW_U,OM_OBSERVATION_IMAGE.ICON_PREVIEW_M,OM_OBSERVATION_IMAGE.ICON_PREVIEW_S,OM_OBSERVATION_IMAGE.ICON_PREVIEW_W,OM_OBSERVATION_IMAGE.ICON_V,OM_OBSERVATION_IMAGE.ICON_B,OM_OBSERVATION_IMAGE.ICON_L,OM_OBSERVATION_IMAGE.ICON_U,OM_OBSERVATION_IMAGE.ICON_M,OM_OBSERVATION_IMAGE.ICON_S,OM_OBSERVATION_IMAGE.ICON_W,OBSERVATION.REVOLUTION,OBSERVATION.PROPRIETARY_END_DATE,OBSERVATION.RA_NOM,OBSERVATION.DEC_NOM,OBSERVATION.POSITION_ANGLE,OBSERVATION.START_UTC,OBSERVATION.END_UTC,OBSERVATION.DURATION,OBSERVATION.TARGET,PROPOSAL.TYPE,PROPOSAL.CATEGORY,PROPOSAL.AO,PROPOSAL.PI_FIRST_NAME,PROPOSAL.PI_SURNAME,TARGET_TYPE.DESCRIPTION,OBSERVATION.LII,OBSERVATION.BII,OBSERVATION.ODF_VERSION,OBSERVATION.PPS_VERSION,OBSERVATION.COORD_OBS,OBSERVATION.COORD_TYPE%20FROM%20FIELD_NOT_USED%20%20WHERE%20OBSERVATION.PROPRIETARY_END_DATE%3E%272017-10-18%27%20%20AND%20%20(PROPOSAL.TYPE=%27Calibration%27%20OR%20PROPOSAL.TYPE=%27Int%20Calibration%27%20OR%20PROPOSAL.TYPE=%27Co-Chandra%27%20OR%20PROPOSAL.TYPE=%27Co-ESO%27%20OR%20PROPOSAL.TYPE=%27GO%27%20OR%20PROPOSAL.TYPE=%27HST%27%20OR%20PROPOSAL.TYPE=%27Large%27%20OR%20PROPOSAL.TYPE=%27Large-Joint%27%20OR%20PROPOSAL.TYPE=%27Triggered%27%20OR%20PROPOSAL.TYPE=%27Target-Opportunity%27%20OR%20PROPOSAL.TYPE=%27TOO%27%20OR%20PROPOSAL.TYPE=%27Triggered-Joint%27)%20%20%20ORDER%20BY%20OBSERVATION.OBSERVATION_ID&PAGE=1&PAGE_SIZE=100&RETURN_TYPE=JSON"
res = requests.get(url)
data = res.json()
result = data['data']

for item in result:
    ID = item['OBSERVATION__OBSERVATION_ID']   
    Surname = item['PROPOSAL__PI_SURNAME']
    Name = item['PROPOSAL__PI_FIRST_NAME']
    print(ID,Surname,Name)

Partial results (ID and Name):

0740071301 La Palombara Nicola
0741732601 Kaspi Victoria
0741732701 Kaspi Victoria
0741732801 Kaspi Victoria
0742150101 Grosso Nicolas
0742240801 Roberts Timothy

Btw, when you reach the target page you will notice two tabs there. This results are derived from (OBSERVATIONS) tab. The link i used above can be found in the chrome developer tools as well.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • Could you specify the position of that link? I can not find it in developer tools. – questionhang Oct 18 '17 at 13:24
  • You can go for whatever option you like. I showed you the easier way. I'm a frequent user of selenium as well. With selenium, it will be a real pain in the rear to achieve what you want. Btw, i used the direct link as the `params` were `urlencoded` and requires `get` request method (according to chrome dev tools). This is it. – SIM Oct 18 '17 at 13:24
0

Since no one has posted a solution yet, here you go. You won't get far with requests, so selenium is your best choice here. If you want to use the below script without any modification, check that:

  • you are on linux or macos, or change dl_dir = '/tmp' to some directory you want
  • you have chromedriver installed, or change the driver to firefox in code (and adapt the download dir configuration according to what firefox wants)

Here is the environment tested with:

$ python -V
Python 3.5.3
$ chromedriver --version
ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2)
$ pip list --format=freeze | grep selenium
selenium==3.6.0

I commented almost each and every line so let the code do the talk:

import os
import time
from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import ui, expected_conditions as EC


def main():
    dl_dir = '/tmp'  # temporary download dir so I don't spam the real dl dir with csv files
    # check what files are downloaded before the scraping starts (will be explained later)
    csvs_old = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}

    # I use chrome so check if you have chromedriver installed
    # pass custom dl dir to browser instance
    chrome_options = webdriver.ChromeOptions()
    prefs = {'download.default_directory' : '/tmp'}
    chrome_options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome(chrome_options=chrome_options)
    # open page
    driver.get('http://nxsa.esac.esa.int/nxsa-web/#search')

    # wait for search ui to appear (abort after 10 secs)
    # once there, unfold the filters panel
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//td[text()="Observation and Proposal filters"]'))).click()
    # toggle observation availability dropdown
    driver.find_element_by_xpath('//input[@title="Observation Availability"]/../../td[2]/div/img').click()
    # wait until the dropdown elements are available, then click "proprietary"
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//div[text()="Proprietary" and @class="gwt-Label"]'))).click()
    # unfold display options panel
    driver.find_element_by_xpath('//td[text()="Display options"]').click()
    # deselect "pointed observations"
    driver.find_element_by_id('gwt-uid-241').click()
    # select "epic exposures"
    driver.find_element_by_id('gwt-uid-240').click()

    # uncomment if you want to go through the activated settings and verify them
    # when commented, the form is submitted immediately
    #time.sleep(5)

    # submit the form
    driver.find_element_by_xpath('//button/span[text()="Submit"]/../img').click()
    # wait until the results table has at least one row
    ui.WebDriverWait(driver, 10).until(EC.presence_of_element_located((by.By.XPATH, '//tr[@class="MPI"]')))
    # click on save
    driver.find_element_by_xpath('//span[text()="Save table as"]').click()
    # wait for dropdown with "CSV" entry to appear
    el = ui.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((by.By.XPATH, '//a[@title="Save as CSV, Comma Separated Values"]')))
    # somehow, the clickability does not suffice - selenium still whines about the wrong element being clicked
    # as a dirty workaround, wait a fixed amount of time to let js finish ui update
    time.sleep(1)
    # click on "CSV" entry
    el.click()

    # now. selenium can't tell whether the file is being downloaded
    # we have to do it ourselves
    # this is a quick-and-dirty check that waits until a new csv file appears in the dl dir
    # replace with watchdogs or whatever
    dl_max_wait_time = 10  # secs
    seconds = 0
    while seconds < dl_max_wait_time:
        time.sleep(1)
        csvs_new = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}
        if csvs_new - csvs_old:  # new file found in dl dir
            print('Downloaded file should be one of {}'.format([os.path.join(dl_dir, file) for file in csvs_new - csvs_old]))
            break
        seconds += 1

    # we're done, so close the browser
    driver.close()


# script entry point
if __name__ == '__main__':
    main()

If everything is fine, the script should output:

Downloaded file should be one of ['/tmp/NXSA-Results-1509061710475.csv']
hoefling
  • 59,418
  • 12
  • 147
  • 194
  • excellent work! full of new details! after click CSV, it is very difficult to select "save file" and 'OK' in the pop-out window, right? – questionhang Oct 28 '17 at 08:21
  • 1
    It indeed is, and this is one reason why I used Chrome. The thing is that the save dialog is not part of browser's ui, but is OS-specific (explorer in Windows, Finder in MacOS, Thunar/Dolphin/Krusader/whatever in Linux) so to handle such dialog, you will usually have to write a lot of code. However, most if the time we don't need to alter download path or file name, so the save dialog is superfluous anyway. With Firefox, you can turn off the save confirmation; check [this answer](https://stackoverflow.com/a/9329022/2650249) with example code. – hoefling Oct 28 '17 at 09:06
  • Although there are libraries out there if you want to automate OS-specific UI like save dialogs; an example is [`pyautogui`](https://pyautogui.readthedocs.io/en/latest/). – hoefling Oct 28 '17 at 09:07