-1

Im scraping data from multiple web pages using Jsoup, how can I get the scraped data to save to file without it overwriting the previous webpage that got scraped

I've tried searching on stack overflow and Jsoup docs for a solution.

        int j = 0;
        int i = 0;
        String URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+j);
        Document doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        Elements temp = doc.select("div.c-listing-athlete__text");



        for (Element fighterList:temp) {
            i++;
            System.out.println(i + " " + fighterList.getElementsByClass("c-listing-athlete__name").first().text());
        }



        j++;
        URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+j);
        doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        temp = doc.select("div.c-listing-athlete__text");

        for (Element fighterList:temp) {
            i++;
            System.out.println(i + " " + fighterList.getElementsByClass("c-listing-athlete__name").first().text());
        }
J.Mead
  • 13
  • 5
  • You don't show the code where you actually write the file, but the obvious thing would be to create a new filename for each page, probably based on what you have just downloaded (e.g. `athletes1`. `athletes2`, etc, depending on the page number) – Thomas Timbul Nov 08 '19 at 11:10
  • and an "meta-java" approach: use an "append pipe": `java MyClass >> out.txt` (when invoking from command line, linux + windows!), this will "pipe" all of the `MyClass System.out.println`'s into `out.txt` (`>` replaces the file, `>>` appends to it (on each program run)) ...https://stackoverflow.com/q/5342832/592355 – xerx593 Nov 08 '19 at 11:19

1 Answers1

0

If you need to save the data from code, just check this, maybe it can help you:

int i = 0;
int pagesNumber = 10;
String URL = "";
Document doc = null;
Elements temp = null;

try {

    // Create file 
    FileWriter fstream = new FileWriter(System.currentTimeMillis() + "out.txt");
    BufferedWriter out = new BufferedWriter(fstream);

    for (i=0; i<pagesNumber; i++) {

        URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+i);
        doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        temp = doc.select("div.c-listing-athlete__text");

        for (Element fighter : temp) {
            out.write(i + " " + fighter.getElementsByClass("c-listing-athlete__name").first().text());
        }
    }

    //Close the output stream
    out.close();

} catch (Exception e) { // Catch exception if any
    System.err.println("Error: " + e.getMessage());
}

Hope it helps :)

J. Lorenzo
  • 151
  • 1
  • 7
  • Thank you this helped a lot, id been trying to loop the page iteration also I'm 7 weeks in to my uni course still got a lot to learn – J.Mead Nov 08 '19 at 14:07