1

I am developing a web crawler application. When i run the program i am getting these error messages below:

enter image description here

i've got these errors after running the program for more that 3 hours. I tried to allocate memory by changing eclipse.ini setting to 2048 MB of ram as it was answered in this topic but still get the same errors after 3 hours or less. I should run the program for more that 2-3 days non-stopping to get analyse the results.

Can you tell me what i am missing here to get these error below ?

These are my classes:

seeds.txt

http://www.stanford.edu
http://www.archive.org

WebCrawler.java

 package pkg.crawler;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.SocketTimeoutException;
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.TimeUnit;

import org.jsoup.HttpStatusException;
import org.jsoup.UnsupportedMimeTypeException;
import org.joda.time.DateTime;


public class WebCrawler {

public static Queue <LinkNodeLight> queue = new PriorityBlockingQueue <> (); // priority queue
public static final int n_threads = 5;                                 // amount of threads
private static Set<String> processed = new LinkedHashSet <> ();         // set of processed urls
private PrintWriter out;                                                // output file
private PrintWriter err;                                                // error file
private static Integer cntIntra = new Integer (0);                              // counters for intra- links in the queue
private static Integer cntInter = new Integer (0);                              // counters for inter- links in the queue
private static Integer dub = new Integer (0);                                   // amount of skipped urls

public static void main(String[] args) throws Exception {
    System.out.println("Running web crawler: " + new Date());

    WebCrawler webCrawler = new WebCrawler();
    webCrawler.createFiles();
    try (Scanner in = new Scanner(new File ("seeds.txt"))) {
        while (in.hasNext()) {
            webCrawler.enque(new LinkNode (in.nextLine().trim()));
        }
    } catch (IOException e) {
        e.printStackTrace();
        return;
    }
    webCrawler.processQueue();
    webCrawler.out.close();
    webCrawler.err.close();
}

public void processQueue(){
    /* run in threads */
    Runnable r = new Runnable() {
        @Override 
        public void run() {
            /* queue may be empty but process is not finished, that's why we need to check if any links are being processed */
            while (true) {
                LinkNode link = deque();
                if (link == null)
                    continue;
                link.setStartTime(new DateTime());
                boolean process = processLink(link);
                link.setEndTime(new DateTime());
                if (!process)
                    continue;
                /* print the data to the csv file */
                if (link.getStatus() != null && link.getStatus().equals(LinkNodeStatus.OK)) {
                    synchronized(out) {
                        out.println(getOutputLine(link));
                        out.flush();
                    }
                } else {
                    synchronized(err) {
                        err.println(getOutputLine(link));
                        err.flush();
                    }
                }
            }
        }
    };
    /* run n_threads threads which perform dequeue and process */
    LinkedList <Thread> threads = new LinkedList <> ();
    for (int i = 0; i < n_threads; i++) {
        threads.add(new Thread(r));
        threads.getLast().start();
    }
    for (Thread thread : threads) {
        try {
            thread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}


/* returns true if link was actually processed */
private boolean processLink(LinkNode inputLink) {
    String url = getUrlGeneralForm(inputLink);
    boolean process = true;
    synchronized (processed) {
        if (processed.contains(url)) {
            process = false;
            synchronized (dub) {dub++;}
        } else
            processed.add(url);
    }
    /* start processing only if the url have not been processed yet or not being processed */
    if (process) {
        System.out.println("Processing url " + url);
        List<LinkNodeLight> outputLinks = parseAndWieghtResults(inputLink);
        for (LinkNodeLight outputLink : outputLinks) {
            String getUrlGeneralForumOutput = getUrlGeneralForm(outputLink);
            /* add the new link to the queue only if it has not been processed yet */
            process = true;
            synchronized (processed) {
                if (processed.contains(getUrlGeneralForumOutput)) {
                    process = false;
                    synchronized (dub) {dub++;}
                }
            }
            if (process) {
                enque(outputLink);
            }
        }
        return true;
    }
    return false;
}

void enque(LinkNodeLight link){
    link.setEnqueTime(new DateTime());
    /* the add method requires implicit priority */
    synchronized (queue) {
        if (link.interLinks)
            synchronized (cntInter) {cntInter++;}
        else
            synchronized (cntIntra) {cntIntra++;}
      //queue.add(link, 100 - (int)(link.getWeight() * 100.f));
        queue.add(link);
    }
}


/**
 * Picks an element from the queue
 * @return top element from the queue or null if the queue is empty
 */
LinkNode deque(){
    /* link must be checked */
    LinkNode link = null;
    synchronized (queue) {
        link = (LinkNode) queue.poll();
        if (link != null) {
            link.setDequeTime(new DateTime());
            if (link.isInterLinks())
                synchronized (cntInter) {cntInter--;}
            else
                synchronized (cntIntra) {cntIntra--;}
        }
    }
    return link;
}

private void createFiles() {
    /* create output file */
    try {
        out = new PrintWriter(new BufferedWriter(new FileWriter("CrawledURLS.csv", false)));
        out.println(generateHeaderFile());
    } catch (IOException e) {
        System.err.println(e);
    }
    /* create error file */
    try {
        err = new PrintWriter(new BufferedWriter(new FileWriter("CrawledURLSERROR.csv", false)));
        err.println(generateHeaderFile());
    } catch (IOException e) {
        System.err.println(e);
    }
}
/**
 * formats the string so it can be valid entry in csv file
 * @param s
 * @return
 */
private static String format(String s) {
    // replace " by ""
    String ret = s.replaceAll("\"", "\"\"");
    // put string into quotes
    return "\"" + ret + "\"";
}
/**
 * Creates the line that needs to be written in the outputfile
 * @param link
 * @return
 */
public static String getOutputLine(LinkNode link){
    StringBuilder builder = new StringBuilder();
    builder.append(link.getParentLink()!=null ? format(link.getParentLink().getUrl()) : "");
    builder.append(",");
    builder.append(link.getParentLink()!=null ? link.getParentLink().getIpAdress() : "");
    builder.append(",");
    builder.append(link.getParentLink()!=null ? link.getParentLink().linkProcessingDuration() : "");
    builder.append(",");
    builder.append(format(link.getUrl()));
    builder.append(",");
    builder.append(link.getDomain());
    builder.append(",");
    builder.append(link.isInterLinks());
    builder.append(",");
    builder.append(Util.formatDate(link.getEnqueTime()));
    builder.append(",");
    builder.append(Util.formatDate(link.getDequeTime()));
    builder.append(",");
    builder.append(link.waitingInQueue());
    builder.append(",");
    builder.append(queue.size());
    /* Inter and intra links in queue */
    builder.append(",");
    builder.append(cntIntra.toString());
    builder.append(",");
    builder.append(cntInter.toString());
    builder.append(",");
    builder.append(dub);
    builder.append(",");
    builder.append(new Date ());
    /* URL size*/
    builder.append(",");
    builder.append(link.getSize());
    /* HTML file
    builder.append(",");
    builder.append(link.getFileName());*/
    /* add HTTP error */
    builder.append(",");
    if (link.getParseException() != null) {
        if (link.getParseException() instanceof HttpStatusException)
            builder.append(((HttpStatusException) link.getParseException()).getStatusCode());
        if (link.getParseException() instanceof SocketTimeoutException)
            builder.append("Time out");
        if (link.getParseException() instanceof MalformedURLException)
            builder.append("URL is not valid");
        if (link.getParseException() instanceof UnsupportedMimeTypeException)
            builder.append("Unsupported mime type: " + ((UnsupportedMimeTypeException)link.getParseException()).getMimeType());
    }
    return builder.toString();

}

/**
 * generates the Header for the file
 * @param link
 * @return
 */
private String generateHeaderFile(){
    StringBuilder builder = new StringBuilder();
    builder.append("Seed URL");
    builder.append(",");
    builder.append("Seed IP");
    builder.append(",");
    builder.append("Process Duration");
    builder.append(",");
    builder.append("Link URL");
    builder.append(",");
    builder.append("Link domain");
    builder.append(",");
    builder.append("Link IP");
    builder.append(",");
    builder.append("Enque Time");
    builder.append(",");
    builder.append("Deque Time");
    builder.append(",");
    builder.append("Waiting in the Queue");
    builder.append(",");
    builder.append("QueueSize");
    builder.append(",");
    builder.append("Intra in queue");
    builder.append(",");
    builder.append("Inter in queue");
    builder.append(",");
    builder.append("Dublications skipped");
    /* time was printed, but no header was */
    builder.append(",");
    builder.append("Time");
    /* URL size*/
    builder.append(",");
    builder.append("Size bytes");
    /* HTTP errors */
    builder.append(",");
    builder.append("HTTP error");
    return builder.toString();

}



String getUrlGeneralForm(LinkNodeLight link){
    String url = link.getUrl();
    if (url.endsWith("/")){
        url = url.substring(0, url.length() - 1);
    }
    return url;
}


private List<LinkNodeLight> parseAndWieghtResults(LinkNode inputLink) {
    List<LinkNodeLight> outputLinks = HTMLParser.parse(inputLink);
    if (inputLink.hasParseException()) {
        return outputLinks;
    } else {
        return URLWeight.weight(inputLink, outputLinks);
    }
}
}

HTMLParser.java

package pkg.crawler;

import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Writer;
import java.math.BigInteger;
import java.util.Formatter;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.logging.Logger;
import java.security.*;
import java.nio.file.Path;
import java.nio.file.Paths;


public class HTMLParser {

private static final int READ_TIMEOUT_IN_MILLISSECS = (int) TimeUnit.MILLISECONDS.convert(30, TimeUnit.SECONDS);
private static HashMap <String, Integer> filecounter = new HashMap<> ();


public static List<LinkNodeLight> parse(LinkNode inputLink){
    List<LinkNodeLight> outputLinks = new LinkedList<>();
    try {
        inputLink.setIpAdress(IpFromUrl.getIp(inputLink.getUrl()));
        String url = inputLink.getUrl();
        if (inputLink.getIpAdress() != null) {
            url.replace(URLWeight.getHostName(url), inputLink.getIpAdress());
        }
        Document parsedResults =  Jsoup
                .connect(url)
                .timeout(READ_TIMEOUT_IN_MILLISSECS)
                .get();
        inputLink.setSize(parsedResults.html().length());
        /* IP address moved here in order to speed up the process */
        inputLink.setStatus(LinkNodeStatus.OK);
        inputLink.setDomain(URLWeight.getDomainName(inputLink.getUrl()));
        if (true) {
            /* save the file to the html */
            String filename = parsedResults.title();//digestBig.toString(16) + ".html";
            if (filename.length() > 24) {
                filename = filename.substring(0, 24);
            }
            filename = filename.replaceAll("[^\\w\\d\\s]", "").trim();
            filename = filename.replaceAll("\\s+",  " ");

            if (!filecounter.containsKey(filename)) {
                filecounter.put(filename, 1);
            } else {
                Integer tmp = filecounter.remove(filename);
                filecounter.put(filename, tmp + 1);
            }
            filename = filename + "-" + (filecounter.get(filename)).toString() + ".html";
            filename = Paths.get("downloads", filename).toString();
            inputLink.setFileName(filename);
            /* use md5 of url as file name */
            try (PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(filename)))) {
                out.println("<!--" + inputLink.getUrl() + "-->");
                out.print(parsedResults.html());
                out.flush();
                out.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        String tag;
        Elements tagElements;
        List<LinkNode> result;


        tag = "a[href";
        tagElements = parsedResults.select(tag);
        result = toLinkNodeObject(inputLink, tagElements, tag);
        outputLinks.addAll(result);


        tag = "area[href";
        tagElements = parsedResults.select(tag);
        result = toLinkNodeObject(inputLink, tagElements, tag);
        outputLinks.addAll(result);
    } catch (IOException e) {
        inputLink.setParseException(e);
        inputLink.setStatus(LinkNodeStatus.ERROR);
    }

    return outputLinks;
}


static List<LinkNode> toLinkNodeObject(LinkNode parentLink, Elements tagElements, String tag) {
    List<LinkNode> links = new LinkedList<>();
    for (Element element : tagElements) {

        if(isFragmentRef(element)){
            continue;
        }

        String absoluteRef = String.format("abs:%s", tag.contains("[") ? tag.substring(tag.indexOf("[") + 1, tag.length()) : "href");
        String url = element.attr(absoluteRef);

        if(url!=null && url.trim().length()>0) {
            LinkNode link = new LinkNode(url);
            link.setTag(element.tagName());
            link.setParentLink(parentLink);
            links.add(link);
        }
    }
    return links;
}

static boolean isFragmentRef(Element element){
    String href = element.attr("href");
    return href!=null && (href.trim().startsWith("#") || href.startsWith("mailto:"));
}

}

Util.java

package pkg.crawler;

import java.util.Date;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;


public class Util {

private static DateTimeFormatter formatter;
static {



    formatter =   DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss:SSS");


}


public static String linkToString(LinkNode inputLink){


    return String.format("%s\t%s\t%s\t%s\t%s\t%s",
            inputLink.getUrl(),
            inputLink.getWeight(),
            formatDate(inputLink.getEnqueTime()),
            formatDate(inputLink.getDequeTime()),
            differenceInMilliSeconds(inputLink.getEnqueTime(), inputLink.getDequeTime()),
            inputLink.getParentLink()==null?"":inputLink.getParentLink().getUrl()
    );
}

public static String linkToErrorString(LinkNode inputLink){

    return String.format("%s\t%s\t%s\t%s\t%s\t%s",
            inputLink.getUrl(),
            inputLink.getWeight(),
            formatDate(inputLink.getEnqueTime()),
            formatDate(inputLink.getDequeTime()),
            inputLink.getParentLink()==null?"":inputLink.getParentLink().getUrl(),
            inputLink.getParseException().getMessage()
    );
}


public static String formatDate(DateTime date){
    return formatter.print(date);
}

public static long differenceInMilliSeconds(DateTime dequeTime, DateTime enqueTime){
    return (dequeTime.getMillis()- enqueTime.getMillis());
}

public static int differenceInSeconds(Date enqueTime, Date dequeTime){
    return (int)((dequeTime.getTime()/1000) - (enqueTime.getTime()/1000));
}

public static int differenceInMinutes(Date enqueTime, Date dequeTime){
    return (int)((dequeTime.getTime()/60000) - (enqueTime.getTime()/60000));
}

}

URLWeight.java

package pkg.crawler;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.regex.Pattern;

public class URLWeight {

public static List<LinkNodeLight> weight(LinkNode sourceLink, List<LinkNodeLight> links) {

    List<LinkNodeLight> interLinks = new LinkedList<>();
    List<LinkNodeLight> intraLinks = new LinkedList<>();

    for (LinkNodeLight link : links) {
        if (isIntraLink(sourceLink, link)) {
            intraLinks.add(link);
            link.setInterLinks(false);
        } else {
            interLinks.add(link);
            link.setInterLinks(true);
        }
    }



static boolean isIntraLink(LinkNodeLight sourceLink, LinkNodeLight link){

    String parentDomainName = getHostName(sourceLink.getUrl());

    String childDomainName = getHostName(link.getUrl());
    return parentDomainName.equalsIgnoreCase(childDomainName);
}

public static String getHostName(String url) {
    if(url == null){
    //  System.out.println("Deneme");
        return "";

    }

    String domainName = new String(url);

    int index = domainName.indexOf("://");
    if (index != -1) {

        domainName = domainName.substring(index + 3);
    }
    for (int i = 0; i < domainName.length(); i++)
        if (domainName.charAt(i) == '?' || domainName.charAt(i) == '/') {
            domainName = domainName.substring(0, i);
            break;
        }

    /*if (index != -1) {

        domainName = domainName.substring(0, index);
    }*/

    /* have to keep www in order to do replacements with IP */
    //domainName = domainName.replaceFirst("^www.*?\\.", "");

    return domainName;
}
public static String getDomainName(String url) {
    String [] tmp= getHostName(url).split("\\.");
    if (tmp.length == 0)
        return "";
    return tmp[tmp.length - 1];
}


}

PingTaskManager.java

package pkg.crawler;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class PingTaskManager {

private static ExecutorService executor = Executors.newFixedThreadPool(100);

public  static void ping (LinkNode e) {
    executor.submit(new PingTaks(e));
}


}

class PingTaks implements Runnable {
 private LinkNode link;
public PingTaks( LinkNode link ) {

}

@Override
public void run() {
    /* link.ping(); */      
}


}

LinkNodeStatus.java

package pkg.crawler;

public enum LinkNodeStatus {
OK,
ERROR

}

LinkNodeLight.java

package pkg.crawler;

import org.joda.time.DateTime;

public class LinkNodeLight implements Comparable<LinkNodeLight> {
protected String url;
protected float weight;
protected DateTime enqueTime;
protected boolean interLinks;

public String getUrl() {
    return url;
}

public float getWeight() {
    return weight;
}

public void setWeight(float weight) {
    this.weight = weight;
}

public DateTime getEnqueTime() {
    return enqueTime;
}


public LinkNodeLight(String url) {
    this.url = url;
}


public void setEnqueTime(DateTime enqueTime) {
    this.enqueTime = enqueTime;
}

@Override
public int compareTo(LinkNodeLight link) {

    if (this.weight < link.weight) return 1;
     else if (this.weight > link.weight) return -1;
        return 0;

    }
}

LinkNode.java

package pkg.crawler;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.Socket;
import java.net.URL;
import java.net.UnknownHostException;
import java.util.Date;



import org.joda.time.DateTime;


public class LinkNode extends LinkNodeLight{
public LinkNode(String url) {
    super(url);
}

private String tag;
private LinkNode parentLink;
private IOException parseException = null; // initialize parse Exception with null
private float weight;
private DateTime dequeTime;
private DateTime startTime;
private DateTime endTime;
private LinkNodeStatus status;
private String ipAdress;
private int size;
private String filename;
private String domain;

public DateTime getStartTime() {
    return startTime;
}

public void setStartTime(DateTime startTime) {
    this.startTime = startTime;
}

public DateTime getEndTime() {
    return endTime;
}

public void setEndTime(DateTime endTime) {
    this.endTime = endTime;
}

public DateTime getDequeTime() {
    return dequeTime;
}

public String getTag() {
    return tag;
}

public LinkNode getParentLink() {
    return parentLink;
}

public Exception getParseException() {
    return parseException;
}

public boolean hasParseException(){
    return parseException!=null;
}


public void setDequeTime(DateTime dequeTime) {
    this.dequeTime = dequeTime;
}

public void setTag(String tag) {
    this.tag = tag;
}

public void setParentLink(LinkNode parentLink) {
    this.parentLink = parentLink;
}

public void setParseException(IOException parseException) {
    this.parseException = parseException;
}

@Override
public boolean equals(Object o) {
    if (this == o) {
        return true;
    }
    if (o == null || getClass() != o.getClass()) {
        return false;
    }

    LinkNode link = (LinkNode) o;

    if (url != null ? !url.equals(link.url) : link.url != null) {
        return false;
    }

    return true;
}

@Override
public int hashCode() {
    return url != null ? url.hashCode() : 0;
}

public long waitingInQueue(){
    return Util.differenceInMilliSeconds( dequeTime,enqueTime );
}

public long linkProcessingDuration(){
    return Util.differenceInMilliSeconds( endTime,startTime );
}

@Override
public String toString() {
    StringBuilder sb = new StringBuilder("LinkNode{");
    sb.append("url='").append(url).append('\'');
    sb.append(", score=").append(weight);
    sb.append(", enqueTime=").append(enqueTime);
    sb.append(", dequeTime=").append(dequeTime);
    sb.append(", tag=").append(tag);
    if(parentLink!=null) {
        sb.append(", parentLink=").append(parentLink.getUrl());
    }
    sb.append('}');
    return sb.toString();
}

public void setStatus(LinkNodeStatus status) {
    this.status = status;
}

public LinkNodeStatus getStatus(){
    if (status == null) {
        status = LinkNodeStatus.ERROR;
    }
    return status;
}

// check server link is it exist or not
/* this method gives fake errors
public LinkNodeStatus ping () {

    boolean reachable = false;
    String sanitizeUrl = url.replaceFirst("^https", "http");

    try {
        HttpURLConnection connection = (HttpURLConnection) new URL(sanitizeUrl).openConnection();
        connection.setConnectTimeout(1000);
        connection.setRequestMethod("HEAD");
        int responseCode = connection.getResponseCode();
        System.err.println(url + " " + responseCode);
        reachable = (200 <= responseCode && responseCode <= 399);
    } catch (IOException exception) {
    }
    return reachable?LinkNodeStatus.OK: LinkNodeStatus.ERROR;
}*/


public String getIpAdress() {
    return ipAdress;
}

public void setIpAdress(String ipAdress) {
    this.ipAdress = ipAdress;
}

/* methods for controlling url size */
public void setSize(int size) {
    this.size = size;
}

public int getSize() {
    return this.size;
}

public void setFileName(String filename) {
    this.filename = filename;
}

public String getFileName() {
    return this.filename;
}

public String getDomain() {
    return domain;
}

public void setDomain(String domain) {
    this.domain = domain;
    }
}
medo0070
  • 511
  • 1
  • 5
  • 21
  • 2
    "I tried to allocate memory by changing eclipse.ini setting to 2048 MB of ram as it was answered in this topic" Well thread suggests "The simple answer (see above) is to increase the JVM memory size. This will help, but it is likely that the real problem is that your web crawling algorithm is creating an in-memory data structure that grows in proportion to the number of pages you visit. If that is the case, the solution maybe to move the data in that data structure to disc; e.g. a database." It's not only about increasing heap size. – Balkrishna Rawool Jun 19 '15 at 21:53
  • 1
    Thank you for your answer. But i am keeping the frontier list of un-crawled urls in Priority Queue memory which is dynamic size. How do you suggest to keeping frontier list of un-crawled urls? If i was reading from a disk can this affect my program speed? – medo0070 Jun 19 '15 at 22:02
  • 2
    I haven't looked at the whole source in your question. But this question http://stackoverflow.com/questions/12807797/java-get-available-memory has an asnwer which shows how t get memory statistics. I suggest you do some logging of memory usage to find out what is causing so much memory use and find out clues about how to minimize/ avoid that. – Balkrishna Rawool Jun 19 '15 at 22:08
  • Thank you i will try to apply it – medo0070 Jun 19 '15 at 22:27
  • 1
    I think you should keep the queue to certain size. But `PriorityBlockingQueue` has no adding method waiting for space to become available. Methods `add()`, `put()` and `offer()` of the class never block. If you change the class of queue to `LinkedBlockingQueue`, it is possible to suppress the consumption of the memory. But there is a possibility of deadlock for all the threads waiting for space to become available. Hmm.... –  Jun 19 '15 at 23:18
  • 1
    [Cross-posted on Code Review](http://codereview.stackexchange.com/q/94138/9357) – 200_success Jun 20 '15 at 05:48
  • @saka1029 If i used PriorityQueue instead of PriorityBlockingQueue can this help me without getting memory crash ? – medo0070 Jun 20 '15 at 12:35
  • @medo0070 No. PriorityQueue is a non thread safe version of PriorityBlockingQue. It is better to wait (Thread.sleep) while size of queue exceeds some threshold in `WebClawler.enque()`. But it may cause a deadlock that is all the threads sleep. –  Jun 20 '15 at 13:18
  • So you prefer using LinkedBlockingQueue on PriorityBlockingQueue ? Another question: is LinkedBlockingQueue has auto priority ordering inside it like PriorityQueue? – medo0070 Jun 20 '15 at 13:28

1 Answers1

1

I tried to allocate memory by changing eclipse.ini setting to 2048 MB of ram as it was answered in this topic but still get the same errors after 3 hours or less.

I hate to repeat myself(*), but in eclipse.ini you set up the memory for Eclipse, which has nothing to do with the memory for your crawler.

When using command line, you need to start it via java -Xmx2G pkg.crawler.WebCrawler.

When starting from Eclipse, you need to add -Xmx2G to the run configuration ("VM arguments" rather than "Program arguments").


(*) Link to a deleted question; requires some reputation to view.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • I will try this too but can this work with me for long time without stop again? I have 4GB Ram on my pc. Thanks – medo0070 Jun 20 '15 at 02:56
  • 2
    @medo0070 It's a pity that you didn't understand what I wrote on CR - now you could get more useful answers here. However, I think that it's actually a good fit for CR and [asked to undelete your question](http://meta.codereview.stackexchange.com/q/5482/14363). – maaartinus Jun 20 '15 at 05:09
  • 1
    @medo0070 It's been undeleted and reopened. Feel free to add some explanation what gets stored, why it's needed and whether it gets removed again and when. Do not edit the code even if improved in the meantime as someone may be reviewing it already (or not, as it's rather long). Also tell us how long it's expected to run. – maaartinus Jun 20 '15 at 05:51
  • thank you for your help. I am expecting to run the program for ever until i terminate it. So i think there is a problem with PriorityQueue or my be in jsoup usage or any other places but until now i did not get how to solve this issue. For now i will try to implements the answers i've got in this post and will be waiting for new answers may it help me. – medo0070 Jun 20 '15 at 12:04