I'm trying to parse webpages recursive by phantomjs.
for example:
WebPage:
link1,
link2,
link3,
link4,
link5
nextPage
what i'm doing with this page:
var parsePage = function(links) {
// parse everyone link
for(var i = 0; i < posts.length; i++ )
parsePost(links[i]);
};
parsePost - i'm getting some information from page, like getting all emails and phones by regex, which take a lot of time
but phantomjs (js) is asynchronous, and not waiting while it'll parse everyone link, and then goes to nextPage. it works a bit another:
- parsing page1
- parsing link1
- parsing link2
....
- parsing link5
- parsing page2
- parsing link1
....
- parsing link5
-> and just now are comes results to console from parsed page1 -> link1
.....
- parsing page3
so it takes my 6gb pc memory at 3 minutes :DDD
how can i solve this problem?
i was trying to do:
1. mb limit program memory use? ( it'll wait while some processes finished and then it continue to parse another pages ? )
2. i was trying to do like :
> page.open(link, function(... here is pageparser ( wich parsing everyone link))
and then page.close()
but pageparser takes a lot of time, so when i use page.close -> it stop pageparser process.