0

In the below example, I am able to identify the overall changes. But I am not able to get the string which has been edited or added separately. Is there any algorithm/approach to detect whether a string is edited or added or deleted within a file? I have tried java File Watcher, but that only detects a file, whether the file has been edited or created or deleted or modified any content within the file or not. It does not provide the changes which has been performed within the file.

diffFiles function just checks whether a string is matching in both files or not. I have made a copy of the base file and checking the differences:

public HashMap<String, Integer> diffFiles(List<String> firstFileContent, List<String> secondFileContent) throws IOException {  
      Integer count = 0;
      final HashMap<String, Integer> diff = new HashMap<String, Integer>();
      for (final String line : firstFileContent) {
          count += 1;
          if (!secondFileContent.contains(line)) {
              diff.put(line, count);
          }
      }
      return diff;
  }

I want to individually identify the strings within the file whether it has been edited or added within the file

Vega
  • 27,856
  • 27
  • 95
  • 103
BipLab
  • 21
  • 4

3 Answers3

0

you may use a class called Checksum , it is used in order to check that a complete message has been received , Checksum intervene in order to ensure that is no bit lost

chu3la
  • 18
  • 5
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/low-quality-posts/26499407) – ThisaruG Jun 24 '20 at 11:28
  • really ? so how can i provide a complete answer ? i wont give complete answer just guide it to solution – chu3la Jun 24 '20 at 12:02
0

Here are some ways you can do that:

Checksum

It is a short representation of your data.

Code:

var content = "this is my file content"
var b = content.getBytes()

To calculate for each of your files you need to:

public static long getChecksum(byte[] bytes) {
    Checksum crc32 = new CRC32();
    crc32.update(bytes, 0, bytes.length);
    return crc32.getValue();
}

If both long are the same. They are exactly the same content.

Apache Commons Codecs

You could also use a sha256 to do that with Apache Commons Codecs:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.11</version>
</dependency>

And the validation is:

String sha = DigestUtils.sha256Hex(yourFullFileContentString);

If both string(e.g.: sha) are the same. You have an identical content.

Guava Library

Google library also have the same possibility

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>20.0</version>
</dependency>

And here the code:

var sha = Hashing.sha256()
  .hashString(yourFullFileContentString, StandardCharsets.UTF_8).toString();

Which one to choose

I would choose the Checksum as it is not intent as a security hash algorithm (SHA).

Patrick Santana
  • 397
  • 3
  • 17
0

With your implementation of diffFiles(), you will get all the lines that are in the first file, but are missing in the second.

It won't give you all the lines that are in the second file, but not in the first file. And it will report lines that have moved their location in the second file as 'unchanged'.

And as you noticed already, you cannot determine whether a line was added/inserted or if an existing line was just modified (fixed a typo, for example).


What you ask for is basically a Java implementation for the 'diff' tool, and StackOverflow has already a bunch of answers for that:

There might be more, and some of the answers do just suggest to use some library, while others do not go the full path to your desired solution, but all of them should give you an idea on how to proceed.

And that the links here do appear also on the right side bar is because these links are here

tquadrat
  • 3,033
  • 1
  • 16
  • 29