0

I am trying to insert big amounts of data (around 100k) into MYSQL database. See below:

        EntityManager em = emf.createEntityManager();


        EntityTransaction transaction = em.getTransaction();

        long startTime = System.nanoTime();
        transaction.begin();
        for(int y = 0; y < 100000; y++) {

            RealVector real = new RealVector(10000);
            for(int i = 0; i < 10000; i++) {
                real.getCoordinates().add((float) i);
            }

            em.persist(real);
        }

        transaction.commit();

The class looks something like this:

    @TableGenerator(name = "vector_gen", table = "id_gen", pkColumnName = "gen_name", valueColumnName = "gen_val", pkColumnValue = "REAL_VECTOR", allocationSize = 100)
    @Id
    @GeneratedValue(strategy = GenerationType.TABLE, generator = "vector_gen")
    private int id;
    private int dimension;
    @ElementCollection
    private List<Float> coordinates;

    public RealVector() {

    }

    public RealVector(int dimension) {
        this.dimension = dimension;
        this.coordinates = new ArrayList<Float>();
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public int getDimension() {
        return dimension;
    }

    public void setDimension(int dimension) {
        this.dimension = dimension;
    }

    public List<Float> getCoordinates() {
        return coordinates;
    }

    public void setCoordinates(List<Float> coordinates) {
        this.coordinates = coordinates;
    }

So in summary I am trying to insert 100k objects of type RealVector into the db. Each RealVector object has an array of 10k objects. But this takes hours to complete and a lot of memory. I am new to MYSQL and JPA. Is there any way I can improve this to make it faster and if possible consume less memory?

p192
  • 518
  • 1
  • 6
  • 19
  • 1
    Take a look at https://stackoverflow.com/questions/9664821/is-jdbc-multi-threaded-insert-possible and see if it can help? – Woodrow Mar 27 '18 at 19:25
  • 1
    You could move your inner loop out; it's replicating the same `RealVector` state `100000` times. – Elliott Frisch Mar 27 '18 at 19:28
  • I don't see it. So I create a RealVector object and add 10k floats to its array. Then persist the object with its 10k floats. Then repeat for another ~100k RealVectors. How would I do this by moving the inner loop out? – p192 Mar 27 '18 at 19:34
  • 1
    Which 10k floats do you add to that array? The same 10k floats, isn't it? Create the object once. Then persist it 100k times. Should be the same result. But create ~1m less objects. – Elliott Frisch Mar 27 '18 at 19:39

0 Answers0