0

I am starting with Java/Tomcat, and I am struggling with a problem that was very easy to solve with C++.

My webservice (sigle webapp) works by using the input values to lookup the numeric answer in a large, pre-calculated table. I am struggling with the initialization of this table.

Problem details:

  • The data table is huge (3000x3000);
  • The data is pre-computed and this computation is very costly (it takes hours);
  • The data is static, it will never change after it is calculated for a given instance;

In C++, I would just define a a static const array and initialize it inline. I was not able to do this in Java, apparently there's no concept of static data initialization in Java, it needs to generate initialization code and this code cannot be larger than 64k. In fact, I couldn't even load the file with the static initialization in Eclipse, it would hang-up.

So I need to initialize the table from a static file on disk. I tried to place a .csv file on WEB-INF/static, but found no way to open it reliably from inside my Java code (the absolute path will be in different places on my development and production environments, for example).

This is my current class definition (with mocked-up data for the initialization):

package com.hmt.restjersey;

public final class G {
    static public final float[][] data = new float[3000][3000];

    //TODO: actual initialization from file
    static {
        Logger.writeEventLog("Initializing G table...");

        for (int alpha = 0; alpha < 3000; alpha++) {
            for (int beta = 0; beta < 3000; beta++) {
                data[alpha][beta] = 1.0f / (1 + alpha + beta);
            }
        }
        Logger.writeEventLog("G table initialized.");
    }
}

So, my questions:

  • How to reliably access the data file (WEB-INF/static/data.csv) to initialize the table?
  • Is a .csv file the best way to load numeric data efficiently?

Also, since the table is huge I would like to have a single instance of it in the server to save memory and speed up initialization. How do I assure that there will be a single instance shared by all servlet processes?

  • How many webapps (WAR's) are you going to have, that access this huge piece of data? If all your servlets are in the same webapp, they can easily share data through either static vars (b/c it's the same classloader) or through ServletContext which is in the application level (as a side note I recommend some care with this, especially when it comes to OOD and to issues of distribution, but that's another issue) – Pelit Mamani Nov 08 '16 at 16:51
  • You probably can add data file into classpath and then load data via `this.getClass().getClassLoader().getResourceAsStream("data.csv")`. Regarding best way to load numeric data efficiently, you also can use `csv` or use prepared data serialized by `protobuf` or other serialization tool. – dmitrievanthony Nov 08 '16 at 16:52
  • Certainly there is static data in Java. – user207421 Nov 08 '16 at 18:47
  • 1
    What I meant was static data initialization, not the "static" keyword. In C if you declare a variable like: "const static data[]={1 ,2, 3}" the data will be accessed directly from the binary. In Java the array will be created and initialized at runtime. – Alexandre de Menezes Nov 09 '16 at 04:27
  • Not really so huge: 3,000 * 3,000 * 4 = 36,000,000 = 36 megs to hold 32-bit (4 octets) `float` primitives. – Basil Bourque Apr 18 '17 at 23:25

1 Answers1

2

That's my two cents:

  1. Regarding memory sharing, if all your servlets are in the same WAR (webapp) then they share static vars (because it's the same classloader), but it's even nicer to use ServletContext which is meant just for this, see ServletContext
  2. As the ServletContext example (link above) shows, you don't necessarily need a static initializer - you can use ServletContextListener to init on application startup (btw you could also do initialization on-demand, in the 'getter' of your huge data).

  3. If you'd like to share memory between 2 different WARs, I don't know a straightforward solution. Theoretically it can be shared if the class with the static var is in TOMCAT_HOME/lib, but iMHO it's confusing and weird

  4. Putting the calculation in file/storage is a great idea, because you might find yourself restarting Tomcat!
  5. As to how to locate the file, I agree with dmitrievanthony's comment regarding getResourceAsStream . Basically it allows you to take the file from your classpath (the same one used for locating code), one simple example would be putting it in /WEB-INF/classes/data.csv , see example code Here (I personally like when this approach is wrapped in "Resource" from Spring framework, but it could be an overkill).
  6. Please note: As mentioned in my comment above, I tried to offer answers to your direct questions for the design you chose, but if I were in your shoes I'd stop to consider this design (e.g. is easy to distribute between servers? is it modular and unit-testable? Could "data.csv" be replaced with a database, or MongoDB, or even a separate "dataService" WAR?). But please ignore this remark if you're already considered it...

Edited: ServletContext example, without static fields:

// Class to encapsulate date:
public class G{
   private double[][] data;
   public static G loadData(){
      data=...// complex loading
   }
}
// Usage in ServletContextListener:
public class MyListener implements ServletContextListener{
   public void contextInitialized(ServletContext ctx) {
         G g= G.loadData();
         ctx.put("myData", g);
   }
// Usage is Servlet:
doGet(...){
    G g=(G) getServletContext().getAttribute("myData");
}

Singleton pattern alternative (but I suggest care in terms of testability and modularity, you may also want to have a look at frameworks such as SpringMVC, but let's start simple):

// Singleton:
public class G{
   private volatile double[][] data;
   private G instance;
   public static G getInstance(){
       // I don't synchronize because I rely on ServletContextListener to initialize once       
       if(data==null) 
           data=... // complex loading
       return data;
   }
}
// ServletContextListener:
public void contextInitialized(ServletContext ctx) {
   G.getInstance();
}
// Usage in servlet:
doGet(){
   G g=G.getInstance(); // I don't like it in terms of OOD, but it works
}
Community
  • 1
  • 1
Pelit Mamani
  • 2,321
  • 2
  • 13
  • 11
  • 1. Yes, it is a single WAR (I edited the question to clarify this); 2. The ServletContext seems way too complicated, speccialy with the huge amounts of data I need to initialize; 5. This won't work on my static class, since I don't have a "this"; 6. The app is way more complicated than this, I have other tables that need to be on a DB because they change and need to be shared between servers; – Alexandre de Menezes Nov 09 '16 at 04:41
  • Thanks for clarifying. I've adited the answer to include code outlines for either ServletContext or Singleton. Note that your remark (2) isn't so bad if data and complexities are encapsulated in "class G". Regarding (5) you don't need "this", you can do for example "Thread.currentThread().getContextClassLoader()" - see the link in (5) – Pelit Mamani Nov 09 '16 at 06:56