1

I have this method:

GenericDatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); 
ByteArrayOutputStream baos = new ByteArrayOutputStream(); 
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
    
public void WriteToFile(Record record) {
             
    this.baos.reset();
    try (FileOutputStream fileOut = new FileOutputStream(avroFile, true)) {
        datumWriter.write(record, encoder);
        encoder.flush();
        fileOut.write("RecordStart\n".getBytes());
        baos.writeTo(fileOut);
        fileOut.write("\nRecordEnd\n".getBytes());
        this.baos.flush();
    } catch (IOException e) {
        logger.error("Error while writing: ", e);
    }
}

The above method is begin called by multiple threads and each thread will write a record between RecordStart and RecordEnd, there may be case where interleaving of logs is happening i.e we will not get our record between RecordStart and RecordEnd So to avoid this situation one solution is to use synchronized but this will cause the performance issue since we are making threads to wait.

So i want some suggestion so we can avoid multiple threads writing to the same file at same time which may cause interleaving of logs ?

Lutzi
  • 416
  • 2
  • 13
Bad Coder
  • 866
  • 1
  • 12
  • 27
  • Does this help you on anyway ? https://stackoverflow.com/questions/3109140/logging-in-multi-threaded-application-in-java – dreamcrash Dec 14 '20 at 13:22
  • Create a POJO object and use it as the locking mechanism: set LOCK that coerces other threads to WAIT for it, modify or write the record, then finally issue a NOTIFY to all so that those who wait can start to compete. –  Dec 14 '20 at 13:33

1 Answers1

0

You can only benefit from parallel processing when your operations can be parallelized. By that I mean:
If you are writing to a file, this specific step of the computation must be done synchronously, be that via synchronized or via file lock, or else you'll get scrambled data.

What you can do to improve performance is: reduce the synchronous/locked block to the minumum possible, leaving the very last step (writing) only on a synchronized or locked block. Other than that you can write to multiple files.

I would prefer to use a file lock because it will keep the method more generalist. If you ever decide expand it so it can be used to write multiple files. Also it avoids other processes to use the file meanwhile (other than your program).

Take a look at this question.


Answering the specific question:

So i want some suggestion so we can avoid multiple threads writing to the same file at same time which may cause interleaving of logs ?

Without losing performance... I don't think there is a way. The very nature of writing to a file demands it to be sequential.


Most of the systems I've seen, which write all the log to a single file, use a queue, and a method that keeps writing record by record while the queue can offer, so everything gets written eventually as long as the system is not constantly receiving more records than the disk can manage.

Lucas Noetzold
  • 1,670
  • 1
  • 13
  • 29