1

After spending about two hours searching for "java get file CRC from MTF" and some tens of variants, i finally end up there asking "the office" :

In Java 8+ on a Debian-based system, is there some way to read the file CRC without reading/processing the file ?

Thanks everyone

Benj
  • 1,184
  • 7
  • 26
  • 57
  • Hey, this may need more focus. First you're asking how to get a crc from the MFT which is a component of NTFS therefore, windows only. Then you want it on linux, which uses a different file system. I suggest rephrasing the question to ask how to get a crc from the filesystem, and leave out the part about the solution not requiring java, and leave that for another question. I would help if you included any information your research found so far. – CausingUnderflowsEverywhere Jan 19 '20 at 17:55
  • @CausingUnderflowsEverywhere you're right, i wrote it in a weird way – Benj Jan 19 '20 at 17:59
  • Maybe there's another way to solve your problem? Why can't you read the file from the filesystem? – CausingUnderflowsEverywhere Jan 19 '20 at 18:25
  • @CausingUnderflowsEverywhere when you've got some bunch of GB-size files to compare, it could be nice just to have a hash – Benj Jan 19 '20 at 18:27
  • Is it possible to suggest to the end-user to handle this on the hardware side? A single NVME PCIe SSD can reach speeds of 2GB/s read. If they have the budget they could rig a RAID setup of NVME PCIe SSDs which could unlock 10GB/s read speeds. Keep in mind something will have to read the file at some point to generate the CRC, so the time it takes to process a file that large is inevitable. Here's some benches of an SSD showing 2GB/s read speeds average: https://ssd.userbenchmark.com/SpeedTest/693540/Samsung-SSD-970-EVO-Plus-1TB – CausingUnderflowsEverywhere Jan 19 '20 at 18:37
  • Are you sure you need a crc? Maybe you're looking for something else that doesnt involve reading entire files. – CausingUnderflowsEverywhere Jan 19 '20 at 18:41
  • @CausingUnderflowsEverywhere i cannot modify the hardware (remote server) and the users don't have to bother technical problems, it has to be transparent to them. – Benj Jan 20 '20 at 09:47

1 Answers1

1

Most filesystems don't store CRC checksums for every file, as recalculating CRCs every file write would be computationally expensive. This means there isn't a place to check for a file checksum, and Java therefore doesn't expose an API to do this.

@Thymine wrote an answer explaining why getting a checksum without reading a file is not possible in a similar question: https://stackoverflow.com/a/7812413

You can however, generate a checksum by reading the file as shown in this code example. But you probably already know this.

public static void main(String[] args) throws Exception {
        byte[] data = Files.readAllBytes(Paths.get("path/to/file.ext"));
        Checksum checksum = new CRC32();
        checksum.update(data);
        System.out.println("CRC32 Checksum: "+ checksum.getValue());
    }
  • I would avoid to read the files as they are numerous and GB-sized, and the comparison process has to be as quick as possible. – Benj Jan 19 '20 at 18:28
  • 1
    I found a "demi-solution" as the updates made to the files are only appends, i will store some kind of blockchain of the updates in a database, it would be computationally expensive only once. So, I accept your answer below and thank you for your time. – Benj Jan 20 '20 at 09:48