-2

Hey i'm trying to make a code that will automate the extraction of all the emails from a website by going through all the links and checking if there's a regex match but i can't figure it out here is what i got.

function getEmails() {

var search_in = document.body.innerHTML;
string_context = search_in.toString();

array_mails = string_context.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
return array_mails;

}
Nicolas Vriak
  • 79
  • 2
  • 10
  • I have no clue what do you expect from us. Where is your problem? We need more than just a function, show sample code with a list of emails so we know what it is supposed to search for. – Oen44 Apr 19 '17 at 20:00
  • Your question seems very broad... At a high level you'll want to read the first page, store the links, and grab all the emails. Then iterate through the links you stored to discover more links and emails. Though, you may want to restrict the links you store to links that are related to the site your scrubbing. If you don't you could end up trying to scrub some pages you have absolutely no interest for. – blaze_125 Apr 19 '17 at 20:11
  • Yes, i need the links of the other pages of the wesbites itself... so do you have any idea ? – Nicolas Vriak Apr 19 '17 at 20:17
  • The idea is my previous post. Read first page, store links, grab emails. Then iterate through stored links to discover more link and emails. Restrict stored links to a predefined link pattern so that you don't end up scrubbing some site you don't want. Since this idea is recursive, it covers all your bases. – blaze_125 Apr 19 '17 at 20:26
  • You already have something to get the emails out of the page. Here is an [SO post that discusses grabbing links](http://stackoverflow.com/questions/3871358/get-all-the-href-attributes-of-a-web-site/3871370#3871370) – blaze_125 Apr 19 '17 at 20:32

1 Answers1

0

You have to create a loop that will open every link that is presented on main page create ajax request and for each page opened use your function to get emails from it and push them to some array. Then you will have another array with all results. You will also need to check if your loop isn't infinite. Storing all links that have already been used will be needed.

b6mba
  • 58
  • 7