-1

I want to download a list of car pictures based on an excel list of car make model name.

I can do that manually by taping the car model in google and save or copy the url of the first picture in google result. But I have like 800 car model names and this is time consuming.

How can I do it ? Thanks

Med El
  • 1
  • 2
  • Hi Med El. Welcome to StackOverflow! Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). For example, show us what yoz already tried and what **concrete** problem you are struggling with. That way you can help others to help you! – dario Feb 20 '20 at 12:46
  • hi @dario Thanks for the suggestion. I think my problem is well explained. – Med El Feb 20 '20 at 12:52
  • can show how your excel files looks – Shubham Shaswat Feb 20 '20 at 13:15
  • In the Excel Sheet I have a column Make and a column model and a column program I concatenate the three first columns to get a full name of the car. e g .: Isuzu D-Max RG01 Isuzu D-Max RG01 Volkswagen Golf VW380 Volkswagen Golf VW380 Nissan X-Trail P33A Nissan X-Trail P33A Toyota Kluger 550B Toyota Kluger 550B Toyota Yaris 400B Toyota Yaris 400B Mazda CX-30 J59K Mazda CX-30 J59K – Med El Feb 20 '20 at 13:17

1 Answers1

2

Here is a function that you can use in R. You'll first need to install.packages("rvest") and install.packages("httr")

library(rvest)
library(httr)

get_first_google_image <- function(car_name)
{
  site <- "https://www.google.com"
  query <- paste0(site, "/search?q=", url_escape(car_name))

  image_page <- read_html(query)                          %>% 
    html_nodes(xpath = "//a[contains(text(), 'Images')]") %>% 
    html_attr("href")

  paste0(site, image_page)             %>%
    read_html(image_page)              %>%
    html_nodes("img")                  %>% 
    html_attr("src")                   %>% 
    {grep("gstatic", ., value = TRUE)} %>% 
    `[`(1)                             %>%
    httr::GET()                        %>%
    httr::content("raw")               %>%
    writeBin(paste0("~/", car_name, ".jpg"))
}

To use it, you just do

get_first_google_image("Mazda MX5")

It will then save the first hit from the Google image search as a jpeg to your home directory.

If you want to get all your car names into R, just select and copy the column in Excel then in R do

car_names <- readClipboard()

Then you can do

for(i in seq_along(car_names) get_first_google_image(car_names[i])

This might take quite a long time to run.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Hello Allan thank you for replying I tried your code in R and I get this exception: Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Expecting a single string value: [type=character; extent=0]. Called from: doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, options = options) – Med El Feb 28 '20 at 15:50