0

I am trying to grab an email from a set of webpages but the issue is that the script that I am using does nothing. The email is located in a popup section that is under a script rather than class. I am not interested in extracting the entire script and then parsing out then email as I am just getting the email in one go. Is there a way to do this?

I have put my code and the html code for the page.

email <- html_text(html_nodes(doc, xpath = "//a[@class = 'email']"))

< script type="application/json" data-iso-key="_0" >

{"#name":"e-address","$":{"type":"email"},"_":"myname@university.edu"}]}

The issue is that the email only exists in the page in this form.

JWH2006
  • 239
  • 1
  • 11
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. I don't understand why you are not interested in extracting the address from the script since that's where the address actually seems to exist. Does the e-mail address exist inside an `a` tag at all on the page? – MrFlick Jun 12 '18 at 14:45
  • it does not, that is my current issue. – JWH2006 Jun 12 '18 at 14:46
  • Then using something like `html_nodes` doesn't' make any sense. Are you new to web scraping? I'm having a hard time following your thought process here. It's hard to make any recommendations based on the very limited example provided here. – MrFlick Jun 12 '18 at 14:49
  • Yes I am new to webscraping. I understand that there is not an associated HTML not but was illustrating the code I had used prior for scraping. My current line of thought is to extract the script, put it all into a csv, sort that file for "email", and extract 50 characters to the right minus the first 8. I just dont know if there is a better way. – JWH2006 Jun 12 '18 at 14:54
  • 1
    Well, the script tag would imply that the data is in a JSON format and there are plenty of JSON parsers for R. Most webpages are not easily scraped because they require that javascript must be run in order to set up with webpage. If you need R to run javascript you'd need to use a package like `Rselenium`. Again, there's too little context here to actually provide a specific recommendation. – MrFlick Jun 12 '18 at 14:57
  • that's actually very helpful and gives me something to build off of where before I did not have a good idea of how to tackle this problem. – JWH2006 Jun 12 '18 at 15:25
  • Kinda hoping this wasn't giving an assist to a spammer or phisher – hrbrmstr Oct 14 '18 at 02:14

0 Answers0