4

My goal is to download an image from a URL. In my case I can't use download.file because my picture is in a web page requiring login and it has some java scripts running in the background before the real image gets visible. This is why I need to do it using RSelenium package.

As suggested here, I've built a docker container with a standalone-chrome tag. Output from Docker terminal:

$ docker-machine ip
192.168.99.100
$ docker ps
CONTAINER ID  IMAGE                              COMMAND                CREATED             STATUS              PORTS                    NAMES
c651dab3a948  selenium/standalone-chrome:3.4.0  "/opt/bin/entry_po..."  24 hours ago        Up 24 hours         0.0.0.0:4445->4444/tcp   cranky_kalam

Here's what I've tried:

require(RSelenium)

# Avoid download prompt to pop up and parsing default download folder
eCaps <- list(
  chromeOptions = 
    list(prefs = list(
      "profile.default_content_settings.popups" = 0L,
      "download.prompt_for_download" = FALSE,
      "download.default_directory" = "C:/temp/Pictures"
    )
    )
)

# Open connection
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100",port = 4445L,browserName="chrome",extraCapabilities = eCaps)
remDr$open()

# Navigate to desired URL with picture
url <- "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
remDr$navigate(url)
remDr$screenshot(display = TRUE) # Everything looks fine here

# Move mouse to the page's center
webElem <- remDr$findElement(using = 'xpath',value = '/html/body')
remDr$mouseMoveToLocation(webElement = webElem)

# Right click and 
remDr$click(2)
remDr$screenshot(display = TRUE) # I don't see the right-click dialog!
# Try to move right-click dialog to 'Save as' or 'Save image as'
remDr$sendKeysToActiveElement(list(key = 'down_arrow',
                                   key = 'down_arrow',
                                   key = 'enter'))
### NOTHING HAPPENS

I've tried to play around with the amount of key = 'down_arrow' and every time I look into C:/temp/Pictures nothing has been saved.

Please note that this is just an example and I know I could have downloaded this picture with download.file. I need a solution with RSelenium for my real case.

Gabriel Mota
  • 302
  • 1
  • 10
  • I may be able to help if you're willing to disclose what site you're really trying to scrape from. – hrbrmstr Aug 29 '17 at 14:51
  • @hrbrmstr I'm trying to scrape from customer.roamler.com. If I find a way to download Google's picture in my example I can definitely use the same code to my real case. – Gabriel Mota Aug 29 '17 at 15:57
  • See https://stackoverflow.com/questions/42293193/rselenium-on-docker-where-are-files-downloaded/42297110#42297110 and https://stackoverflow.com/questions/42607389/download-file-with-rselenium-docker-toolbox?s=1|3.0660 and https://stackoverflow.com/questions/42476693/rselenium-hangs-in-navigate-to-direct-pdf-download for related content – jdharrison Aug 29 '17 at 19:21
  • A pkg about to be on CRAN — [`splashr`](https://github.com/hrbrmstr/splashr) — can let you do the resource grabbing w/o the need to click/download. – hrbrmstr Aug 29 '17 at 21:26
  • 1
    @jdharrison, first of all thank you very much for your wonderful package `RSelenium`. Back to my problem: I can successfully download the .cfm and .zip files used on the questions/examples you asked me to see. And this is because (I guess) they have an element to click and download. In my case I need to do a "Save as", there's no element to click on... – Gabriel Mota Aug 30 '17 at 14:25
  • @hrbrmstr Your package looks great. Is there a way to run `install_splash()` without using localhost? I'm using Windows 7 + VirtualBox to run docker containers --> I can't have a Splash instance running in my localhost (or I don't have enough knowledge to do it) – Gabriel Mota Aug 31 '17 at 11:53
  • right now, you'll have to follow their instructions — http://splash.readthedocs.io/en/stable/install.html — to do that. I'm working with some folks to make it easier to get docker-things for R more uniform across platforms. – hrbrmstr Aug 31 '17 at 11:56

1 Answers1

-1

I tried using remDr$click(buttonId = 2) to perform Right click but to no avail. Thus, one workaround to save the image would be extracting links from the webpage and using download.file to download it.

#navigate 
url <- "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
remDr$navigate(url)

#get the link of image
link = remDr$getPageSource()[[1]] %>%
  read_html() %>% html_nodes('img') %>% 
  html_attr('src')
[1] "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"

#download using download.file in your current working directory. 
download.file(link, basename(url), method = 'curl')
Nad Pat
  • 3,129
  • 3
  • 10
  • 20