2

I am designing a multi-threaded piece of code that includes a part where several sensors are queried (through socket), and their data is stored first in a Vector and subsequently written into a DB.

The entire process is time sensitive since the each sensor updates every few seconds with a new data. If the data is not retrieved in time, it is lost. Currently, I have a Vector of (Custom Sensor Data) class that stores the information obtained from and about each of the sensors.

The plan was to open a thread for each sensor (say, 40-50 in total, but do not want to limit by number in case more sensors are added later) and have it access and fill a particular (set by the index of Vector) cell of the Vector.

Is such operation on the Vector allowed and prudent? Also, knowing the peculiarities of TCP/IP sockets, am I likely to drastically speed up the process by introducing threads (as opposed to, say, running everything in a single thread)? Is there a better or more elegant way of doing this?

ArtforLife
  • 359
  • 1
  • 5
  • 16
  • 1
    *The plan was to open a thread for each sensor (say, 40-50 in total, but do not want to limit by number in case more sensors are added later)* Do not do that. Creating many threads is a) expensive and b) likely to slow down your system. It would be better to use tasks. If you have sockets, you can create a task to handle the communication and throw that task in an [`ExecutorService`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html) (e.g. a [`ThreadPoolExecutor`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html). – Turing85 Jul 14 '15 at 15:28
  • To what indices would a sensor write to? Would it be exclusive to an index in the vector? Also note that you are better not using Java's `Vector` class. http://stackoverflow.com/questions/1386275/why-is-java-vector-class-considered-obsolete-or-deprecated?rq=1 – E_net4 Jul 14 '15 at 15:29
  • That was the plan, yes. For instance in Vector[1] is contained information about and from Sensor #1, Vector[2] ~ Sensor #2, and so on. Does that mean if two different threads try to modify the structure at different indices, then an exception will be thrown? – ArtforLife Jul 14 '15 at 15:38
  • It is not safe to concurrently modify an ArrayList, I suppose the same applies to Vector class. It would be safe with a normal array if there are no overlapping indices, but before reading the data from other thread(s) you still have to do some kind of synchronization. In a case like this there is the AtomicReferenceArray class, but I'm not sure it fits the case because you'd incur in losing old entries before they have been saved to database. – Shepard Jul 14 '15 at 18:04

3 Answers3

2

From what you write it seems that a Queue is beffer fit; your threads push sensor data on the queue and later (perhaps with another thread) you can take elements from the queue and process them. Java (at least version 7 and 8) offers some different queue implementations, even for usage in a multithreaded environment. As Turing85 wrote in his comment consider the usage of a thread pool instead of a creating a thread for each sensor.

EDIT: reading the comment it seems there are tow different kind of problems

how to efficiently query the sensors (threads, tasks, pools, etc)

Form the questions it seems that you are connecting to the sensors to read the data, and this must be done at fixed rate for each sensor. You can use a ScheduledThreadPoolExecutor and use the method scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit) where Runnable is the Object thar read the data from sensor; you must schedule a taks for each sensor; the thread pool size is specified in the constructor. In order to minimize the thread number you must do the less you can in the class that reads the data. I suggest you to pust the data in a Queue or in a Map or in Set, it depends how your data sre stuctured. The map is the same as the vector you proposed, but instead of using the index you can use generic key to insert the data, and you don't care of sizing the collection.

and how to efficiently organize the data for subsequent database submission

After the data are in the collection you can read and process them; you can store in a database, or check for duplicates or whatever you need. I prefer having two different "layers", one collects the data the other one process what has been collected; putting an "interface" between the two allows your design to evolve only one side without touching the other one.

NOTE: my solution allows you to lost data, if for some reason the server goes down, the data in the collection that has not been processed is not available anymore.

Giovanni
  • 3,951
  • 2
  • 24
  • 30
  • Why use something like `Queue`, if there are already implementations, that schedule tasks (see [my comment above](http://stackoverflow.com/questions/31410725/concurrent-threads-in-vector-java#comment50795019_31410725))? – Turing85 Jul 14 '15 at 15:43
  • Because it was somethin similar to the logic of the poster, a task (Thread from a pool) that collects the data and put in "container" for later elaboration (storage but even evething else) – Giovanni Jul 14 '15 at 15:47
  • I have looked at the Queue just now. Would it allow me to avoid collecting duplicate data? That is, if I pushed something to the queue from Sensor1 and then a few others, there is a possibility that the thread will return to Sensor1 before it changed. Hence, it would push duplicate data into the Queue. I can include checks for that, but you can probably see how it is more difficult to do such checks in a Queue than some indexed structure. Let me know if I am missing something here. – ArtforLife Jul 14 '15 at 15:57
  • Both vector and queue allows duplicate, a map or set allows you to put data and manage uniqueness more easily – Giovanni Jul 14 '15 at 16:07
  • If duplicate items is a problem, you could probably keep the queue and manage those cases at the other end (the container, I suppose). – E_net4 Jul 14 '15 at 16:13
  • I see. It appears to me that the question has been reduced to two now: how to efficiently query the sensors (threads, tasks, pools, etc) and how to efficiently organize the data for subsequent database submission. – ArtforLife Jul 14 '15 at 16:19
0

As other people suggested I'd go with a LinkedBlockingQueue, with the producers that are the threads querying the sensors and storing the data within the queue, and the consumer the thread who sends the records to database.

The threading model depends on how your sensor behaves. If you make a request but it does not answer until it can send new data, you can safely start a new thread for each sensor. This will allow you to write simple blocking I/O code and avoid using more complicated ways of doing it.

It is true that a lots of threads slows down a program (and maybe the entire system), but it would be a concern with thousands or even tens of thousands of them. In addition, your sensors provide data every few seconds, thus the readers will be idle most of the time.

Shepard
  • 801
  • 3
  • 9
  • 17
0

For anyone interested, I ended up going with the ConcurrentLinkedDequeue for this problem. It seems to have everything I need: the queue properties and the concurrency managed automatically. P.S. I also ended up using the ThreadPool.

ArtforLife
  • 359
  • 1
  • 5
  • 16