6

I have to read a text file from my Java application.

The file contains many rows and this file is updated every X minutes from an external unknown application that appends new lines to the file.

I have to read all the rows from the file and then I have to delete all the records that I've just read.

Is it possibile to let me read the file row by row, deleting each row I read and at the same time allowing the external application to append other rows to the file?

This file is located in a Samba shared folder so I'm using jCIFS to read/write the file and BufferedReader Java class.

thanks in advance

Roberto Milani
  • 760
  • 4
  • 20
  • 40
  • Changing a file which is written to by an application which is not under your control is a bad idea. Why do you need to delete? Perhaps it would be enough just to maintain a marker of how many lines you have read so far, without changing the file? – RealSkeptic Aug 04 '16 at 15:25
  • This is the sort of thing that `Socket Writing` and `RESTful POST` commands were made for. – Susannah Potts Aug 04 '16 at 15:26
  • @RealSkeptic I need to delete or update the lines that I've just read because I think that's the easiest way to mark the rows as "already processed". After processing a row, I have to store in a MySQL table so I don't need to leave the rows in the file anymore. – Roberto Milani Aug 04 '16 at 16:03
  • @SusannahPotts I know but unfortunately the external application is not under my control :) – Roberto Milani Aug 04 '16 at 16:04
  • @RobertoMilani This is true, I just couldn't hold that thought to myself. – Susannah Potts Aug 04 '16 at 16:05
  • *How* would you delete rows from such a file? To delete data at the beginning of a file, you have to rewrite the rest of the file. Yet there's no way to know if the other application has written more to the file when you try to rewrite it - and even if you come up with a way to know, you'd have to prevent the other application from writing to the file while you're rewriting it. The entire idea is fundamentally broken. – Andrew Henle Aug 04 '16 at 16:32

3 Answers3

2

I don't know the perfect solution to your problem, but I would solve it differently:

  • rename the file (give it a unique name with an timestamp)
  • the appender job will then automatically re-create it
  • process your time-stamped files (no need to delete them, keep them in place so you can later check what happened)
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
1

Problem is we don't know how the external application write and/or reuse this file. It could be a problem if you delete rows while the external application use a counter to run correctly...

There is no good solution unless you know how the other app works.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
N0un
  • 868
  • 8
  • 31
0

Is it possibile to let me read the file row by row, deleting each row I read and at the same time allowing the external application to append other rows to the file?

Yes, you can open the same file for reading and writing from multiple processes. In Linux, for example, you will get two separate file descriptors for the same file. For file writes under the size of PIPE_BUF, or 4096 bytes in Linux, it is safe to assume the operations are atomic, meaning the kernel is handling the locking and unlocking to prevent race conditions.

Assuming Process A is writing to the file has opened it as APPEND, then each time Process A tells the kernel to write() it will first seek to the size of the file (the end of the file). That means you can safely delete data in the file from Process B as long it is done in between the write operations of Process A. And as long as the write operations from Process A don't exceed PIPE_BUF, Linux guarantees they will be atomic, i.e. Process A can spam write operations and process B can constantly delete/write data, and no funky behavior will result.

Java provides you with implemented File Locks. But it's important to understand that it is only "advisory," not "mandatory." Java does not enforce the restriction, both processes must implement a check to see if another process holds the lock.

Community
  • 1
  • 1
Mike S
  • 11,329
  • 6
  • 41
  • 76
  • Locking? Linux will handle it? Can you show documentation which confirms this assertion? – RealSkeptic Aug 04 '16 at 15:31
  • @RealSkeptic I can't find an official post from Linus Torvald. But if you google it you will find an abundance of evidence in books, websites, and Operating Systems courses. [Here is one example](http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node27.html). Also, more [stackoverflow support](http://stackoverflow.com/a/2751750/1241782). – Mike S Aug 04 '16 at 15:52
  • *For file writes under the size of PIPE_BUF, or 4096 bytes in Linux, it is safe to assume the operations are atomic, meaning the kernel is handling the locking and unlocking to prevent race conditions.* Usually true - but in this case it's a shared file system that's apparently being updated across network connections. Reading from a file being written to is hard enough to do reliably with low-level C code on a local file - doing it in Java across a network *and* modifying the file concurrently with the writing process is going to be extremely difficult at best. – Andrew Henle Aug 04 '16 at 16:27
  • Especially as input is usually buffered to an unknown size which may well be after that limit. – RealSkeptic Aug 04 '16 at 16:37
  • @RealSkeptic inputs are buffered at high levels, like programming languages or network protocols. The kernel is not so carefree because accuracy and atomicity are extremely important. The kernel will not buffer these writes, it will either block until unlocked or return an errno, depending on whether O_NONBLOCK flag is set. – Mike S Aug 04 '16 at 17:20
  • The kernel won't buffer, but just write them as they come. You can't write expect atomicity on large buffers you write and that's exactly the problem. – RealSkeptic Aug 04 '16 at 17:30
  • @RealSkeptic correct, that's why I wrote that in my answer. – Mike S Aug 04 '16 at 17:33
  • The problem is that your answer seems to imply that it's OK to read and write, when in fact it's only OK if you have control over the two processes and you take care not to buffer, and you also know that no underlying implementation layers are buffering (e.g. CIFS library, network). – RealSkeptic Aug 04 '16 at 17:41