I know that is an error that has already been asked many times, but I can't find where I have the problem. The error It is shown is the following:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 393, Size: 393
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at scraping.complementos_juegos.main(complementos_juegos.java:305)
There are many things I don't understand. The first line, where it shows Index: 393, Size: 393
, what does that mean? The index and size of the array?
Let's go to the code:
1.- I scrap more than 2.700 links that are saved in an array called all_links
. As I want to store a lot of information I am using a bidimensional ArrayList called listaEmpresaA
:
ArrayList<ArrayList<String>> listaEmpresaA = new ArrayList<ArrayList<String>>();
String [] paises = {"USA"};
int total_columnas = 2 + (paises.length*3);
//CREATING THE COLUMS
for(int i =0; i< total_columnas; i++){
listaEmpresaA.add(new ArrayList<String>());
}
//DEFINITION OF THE ROWS
//<--------------- START OF THE HEADER DEFINITION
listaEmpresaA.get(0).add("Juego");
listaEmpresaA.get(1).add("URL");
for (z=0 ; z<paises.length; z++) {
for (int j=2; j<total_columnas ; j=j+3 ) {
listaEmpresaA.get(j).add(paises[z]);
listaEmpresaA.get(j+1).add(paises[z] + " Gold");
listaEmpresaA.get(j+2).add(paises[z] + " sin Gold");
}
}
int filas = 1; //JUST TO KNOW THE AMOUNT OF ROWS I HAVE
//<--------------- FINISH OF THE HEADER DEFINITION
//<--------------- STARTING OF THE SCRAPING FOR EACH LINK
int contador_juegos = 1;
for (String link : all_links) {
String urlPage = "https://www.microsoft.com" + link;
System.out.println(contador_juegos + ".- Comprobando entradas de: "+urlPage);
if (getStatusConnectionCode(urlPage) == 200) {
Document document = getHtmlDocument(urlPage);
Elements entradas = document.select("div.page-header div.m-product-detail-hero-product-placement div.context-product-placement-data");
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("c-heading-2").text();
System.out.println(titulo+"\n");
listaEmpresaA.get(0).add(titulo);
listaEmpresaA.get(1).add(urlPage);
}
entradas = document.select("div.price-info");
for (Element elem : entradas) {
String titulo = elem.getElementsByTag("s").text();
System.out.println("Precio base: " + titulo+"\n");
listaEmpresaA.get(2).add(titulo);
}
entradas = document.select("div.price-info");
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("price-disclaimer").text();
System.out.println("Precio para los miembros sin GOLD: " + titulo+"\n");
listaEmpresaA.get(3).add(titulo);
}
entradas = document.select("dd.cli_upsell-options div.cli_upsell-option");
// Paseo cada una de las entradas
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("price-disclaimer").text();
System.out.println("Precio para los miembros GOLD: " + titulo+"\n");
listaEmpresaA.get(4).add(titulo);
}
filas++;
}
contador_juegos++;
}
//<--------------- FINISH OF THE SCRAPING FOR EACH LINK BAZAR USA
2.- Create the Excel and store the information from listaEmpresaA
arrayList to the Excel.
try {
//create .xls and create a worksheet.
FileOutputStream fos = new FileOutputStream("D:\\mierda.xls");
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet worksheet = workbook.createSheet("XboxOne");
int l=0;
//CREATING EXCEL ROWS
for (int f=0; f< filas ; f++) {
HSSFRow fila = worksheet.createRow(f);
//CREATING EXCEL COLUMNS
for(int c=0;c<total_columnas;c++){
HSSFCell celda = fila.createCell(c);
celda.setCellValue(listaEmpresaA.get(c).get(f)); //<----- THIS IS THE LINE 305 WHERE I HAVE THE ERROR
l++;
}
}
//Save the workbook in .xls file
workbook.write(fos);
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I have many questions and I would really appreciate if you could give me some tips so that I can find the solution my-self:
1.- I don't understand why it is shown the problem at this point Index: 393, Size: 393
when the program has been running until the index 2730 of the total links (2.751 links in total). The last data shown on the console is:
2730.- Comprobando entradas de: https://www.microsoft.com/en-us/store/p/star-wars-pinball-season-1-bundle/brz3mqfjnlmw Star Wars™ Pinball Season 1 Bundle
Precio base:
Precio para los miembros sin GOLD:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 393, Size: 393 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at scraping.complementos_juegos.main(complementos_juegos.java:305)
2.- When I use the for-each bucle, I have realized that it is not exactly following the order set on the array. I don't know why.
3.- The program spends 1 hour just to scrap information from one store, and I want to store more than 50, ¿is there a way to reduce this time? I have read something about "HashMap" but I don't know how to use them. Anyway if it is a better solution I will take a look.
Thanks in advance!
Have a good day.