4

in one requirement, i need to copy multiple files from one location to another network location.

let assume that i have the following files present in the /src location.
a.pdf, b.pdf, a.doc, b.doc, a.txt and b.txt

I need to copy a.pdf, a.doc and a.txt files atomically into /dest location at once.

Currently i am using Java.nio.file.Files packages and code as follows

Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1 = Paths.get("/dest/a.pdf");

Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2 = Paths.get("/dest/a.doc");

Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3 = Paths.get("/dest/a.txt");

Files.copy(srcFile1, destFile1);
Files.copy(srcFile2, destFile2);
Files.copy(srcFile3, destFile3);

but this process the file are copied one after another.
As an alternate to this, in order to make whole process as atomic, i am thinking of zipping all the files and move to /dest and unzip at the destination.

is this approach is correct to make whole copy process as atomic ? any one experience similar concept and resolved it.

Dhana
  • 813
  • 1
  • 8
  • 19
  • 1
    You could also copy them one by one, first with a .tmp file extension and then rename them. But what is your goal ? – J. Doe Sep 10 '20 at 19:22
  • @J.Doe copying multiple file one by one is not an atomic action right. think of a transaction where data of multiple tables stored in one shot similarly i want here. – Dhana Sep 11 '20 at 12:41
  • I don't think there's a way to get pure atomicity where you are guaranteed to either get exactly what you want or have no change to the filesystem occur whatsoever. But you can get close doing something like what @J.Doe suggests. I had a similar thought, but mine was to first copy the files to a hidden directory in the directory you really want to copy them to. Then you'd move them into place after the copy. You could be pretty confident that the 3 moves would succeed and work quickly, but there would still be a short time when only 1 or 2 of the files would be there. – CryptoFool Sep 24 '20 at 04:38
  • If you want it the above scenario to happen as a single transaction, I would suggest you to use Stream API which was introduced in Java 8. First, insert the byte-stream for each file separated by a character in the Stream object, then just send it across the network. Upon reachng the destination location, you can iterate over the Stream object and insert each byte-stream at a location. Also, if you need to maintain the file format (.docx,.pdf,.txt), you should use a Map object defined as `Map` and send this Map object as a Stream object over the network. – Ayush28 Sep 24 '20 at 04:41
  • your solution of zip and copy is right and atomic so when there's is single problem, no file will appear in destination directory. just make sure that you compress your files in temp directory – Halayem Anis Sep 25 '20 at 12:06

5 Answers5

2

is this approach is correct to make whole copy process as atomic ? any one experience similar concept and resolved it.

You can copy the files to a new temporary directory and then rename the directory.

Before renaming your temporary directory, you need to delete the destination directory

If other files are already in the destination directory that you don't want to overwrite, you can move all files from the temporary directory to the destination directory.

This is not completely atomic, however.

With removing /dest:

String tmpPath="/tmp/in/same/partition/as/source";
File tmp=new File(tmpPath);
tmp.mkdirs();
Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1 = Paths.get(tmpPath+"/dest/a.pdf");

Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2 = Paths.get(tmpPath+"/dest/a.doc");

Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3 = Paths.get(tmpPath+"/dest/a.txt");

Files.copy(srcFile1, destFile1);
Files.copy(srcFile2, destFile2);
Files.copy(srcFile3, destFile3);
delete(new File("/dest"));
tmp.renameTo("/dest");
void delete(File f) throws IOException {
  if (f.isDirectory()) {
    for (File c : f.listFiles())
      delete(c);
  }
  if (!f.delete())
    throw new FileNotFoundException("Failed to delete file: " + f);
}

With just overwriting the files:

String tmpPath="/tmp/in/same/partition/as/source";
File tmp=new File(tmpPath);
tmp.mkdirs();
Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1=paths.get("/dest/a.pdf");
Path tmp1 = Paths.get(tmpPath+"/a.pdf");

Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2=Paths.get("/dest/a.doc");
Path tmp2 = Paths.get(tmpPath+"/a.doc");

Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3=Paths.get("/dest/a.txt");
Path destFile3 = Paths.get(tmpPath+"/a.txt");

Files.copy(srcFile1, tmp1);
Files.copy(srcFile2, tmp2);
Files.copy(srcFile3, tmp3);

//Start of non atomic section(it can be done again if necessary)

Files.deleteIfExists(destFile1);
Files.deleteIfExists(destFile2);
Files.deleteIfExists(destFile2);

Files.move(tmp1,destFile1);
Files.move(tmp2,destFile2);
Files.move(tmp3,destFile3);
//end of non-atomic section

Even if the second method contains a non-atomic section, the copy process itself uses a temporary directory so that the files are not overwritten.

If the process aborts during moving the files, it can easily be completed.

See https://stackoverflow.com/a/4645271/10871900 as reference for moving files and https://stackoverflow.com/a/779529/10871900 for recursively deleting directories.

dan1st
  • 12,568
  • 8
  • 34
  • 67
2

First there are several possibilities to copy a file or a directory. Baeldung gives a very nice insight into different possibilities. Additionally you can also use the FileCopyUtils from Spring. Unfortunately, all these methods are not atomic.

I have found an older post and adapt it a little bit. You can try using the low-level transaction management support. That means you make a transaction out of the method and define what should be done in a rollback. There is also a nice article from Baeldung.

@Autowired
private PlatformTransactionManager transactionManager;

@Transactional(rollbackOn = IOException.class)
public void copy(List<File> files) throws IOException {
    TransactionDefinition transactionDefinition = new DefaultTransactionDefinition();
    TransactionStatus transactionStatus = transactionManager.getTransaction(transactionDefinition);

    TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {

        @Override
        public void afterCompletion(int status) {
            if (status == STATUS_ROLLED_BACK) {
                // try to delete created files
            }
        }
    });

    try {
        // copy files
        transactionManager.commit(transactionStatus);
    } finally {
        transactionManager.rollback(transactionStatus);
    }
}

Or you can use a simple try-catch-block. If an exception is thrown you can delete the created files.

flaxel
  • 4,173
  • 4
  • 17
  • 30
0

Your question lacks the goal of atomicity. Even unzipping is never atomic, the VM might crash with OutOfMemoryError right in between inflating the blocks of the second file. So there's one file complete, a second not and a third entirely missing.

The only thing I can think of is a two phase commit, like all the suggestions with a temporary destination that suddenly becomes the real target. This way you can be sure, that the second operation either never occurs or creates the final state.

Another approach would be to write a sort of cheap checksum file in the target afterwards. This would make it easy for an external process to listen for creation of such files and verify their content with the files found.

The latter would be the same like offering the container/ ZIP/ archive right away instead of piling files in a directory. Most archives have or support integrity checks.

(Operating systems and file systems also differ in behaviour if directories or folders disappear while being written. Some accept it and write all data to a recoverable buffer. Others still accept writes but don't change anything. Others fail immediately upon first write since the target block on the device is unknown.)

motzmann
  • 151
  • 6
0

FOR ATOMIC WRITE:

There is no atomicity concept for standard filesystems, so you need to do only single action - that would be atomic.

Therefore, for writing more files in an atomic way, you need to create a folder with, let's say, the timestamp in its name, and copy files into this folder.

Then, you can either rename it to the final destination or create a symbolic link.

You can use anything similar to this, like file-based volumes on Linux, etc.

Remember that deleting the existing symbolic link and creating a new one will never be atomic, so you would need to handle the situation in your code and switch to the renamed/linked folder once it's available instead of removing/creating a link. However, under normal circumstances, removing and creating a new link is a really fast operation.

FOR ATOMIC READ:

Well, the problem is not in the code, but on the operation system/filesystem level.

Some time ago, I got into a very similar situation. There was a database engine running and changing several files "at once". I needed to copy the current state, but the second file was already changed before the first one was copied.

There are two different options: Use a filesystem with support for snapshots. At some moment, you create a snapshot and then copy files from it. You can lock the filesystem (on Linux) using fsfreeze --freeze, and unlock it later with fsfreeze --unfreeze. When the filesystem is frozen, you can read the files as usual, but no process can change them.

None of these options worked for me as I couldn't change the filesystem type, and locking the filesystem wasn't possible (it was root filesystem).

I created an empty file, mount it as a loop filesystem, and formatted it. From that moment on, I could fsfreeze just my virtual volume without touching the root filesystem.

My script first called fsfreeze --freeze /my/volume, then perform the copy action, and then called fsfreeze --unfreeze /my/volume. For the duration of the copy action, the files couldn't be changed, and so the copied files were all exactly from the same moment in time - for my purpose, it was like an atomic operation.

Btw, be sure to not fsfreeze your root filesystem :-). I did, and restart is the only solution.

DATABASE-LIKE APPROACH:

Even databases cannot rely on atomic operations, and so they first write the change to WAL (write-ahead log) and flush it to the storage. Once it's flushed, they can apply the change to the data file.

If there is any problem/crash, the database engine first loads the data file and checks whether there are some unapplied transactions in WAL and eventually apply them.

This is also called journaling, and it's used by some filesystems (ext3, ext4).

Václav Hodek
  • 638
  • 4
  • 9
0

I hope this solution would be useful : as per my understanding you need to copy the files from one directory to another directory. so my solution is as follows: Thank You.!!

public class CopyFilesDirectoryProgram {

public static void main(String[] args) throws IOException {
    // TODO Auto-generated method stub
    String sourcedirectoryName="//mention your source path";
    String targetdirectoryName="//mention your destination path";
    File sdir=new File(sourcedirectoryName);
    File tdir=new File(targetdirectoryName);
    //call the method for execution
    abc (sdir,tdir);

}

private static void abc(File sdir, File tdir) throws IOException {
    
    if(sdir.isDirectory()) {
        copyFilesfromDirectory(sdir,tdir);
    }
        else
        {
            Files.copy(sdir.toPath(), tdir.toPath());
        }
    }


private static void copyFilesfromDirectory(File source, File target) throws IOException {
    
    if(!target.exists()) {
        target.mkdir();
        
    }else {
        for(String items:source.list()) {
            abc(new File(source,items),new File(target,items));
        }
    }
}

}

Reshma S
  • 1
  • 1