I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.
The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.
For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.
Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1
can be used to indicate the end of the list as well as empty cells.
To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.
The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next
index overhead per particle.
I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.
Data structure:

As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.
To figure out which cell a particle occupies, just do:
cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x
... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.