I use rvest to retrieve the titles from google query result. My code is like this:
> url = URLencode(paste0("https://www.google.com.au/search?q=","600d"))
> page <- read_html(url)
> page %>%
html_nodes("a") %>%
html_text()
However, the result includes not only just titles, but also something else, like:
[24] "Past month"
[25] "Past year"
[26] "Verbatim"
[27] "EOS 600D - Canon"
[28] "Similar"
[29] "Canon 600D | BIG W"
[30] "Cached"
[31] "Similar"
......
[45] ""
[46] ""
where what I need are [27] "EOS 600D - Canon" and [29] "Canon 600D | BIG W". They are shown in google query like this:
All of others are just noises for me. Could anyone please tell me how to get rid of those?
Also, if I want the description part as well, what I should do?