3

When I commit a java source file to GitHub and create a pull request, it shows the whole file as a diff. When "Hide whitespace changes" is selected on the diff screen, the problem is solved.

Could this be related to file encoding? notepad++ shows ANSI for the same file in both branches. BeyondCompare shows only changed lines as diff, unlike Github.

As a more general question, do .java files contain an encoding header? Is there a single specific encoding assigned to each file?

Thanks.

Saim Doruklu
  • 464
  • 2
  • 5
  • 16
  • 2
    Please don't post multiple (effectively) unrelated questions at once, post them as separate questions instead. I'll "answer" the Java part here: .java source files don't have a dedicated encoding header and the encoding of Java source files is usually indicated in the build files used to build them (having said that UTF-8 is such a common encoding to use here as to be the de-facto standard). – Joachim Sauer Jun 22 '20 at 10:08

2 Answers2

4

Newline differences are not (usually) related to encoding differences, it's more subtle.

A UTF-8 encoded file on Windows might end up with newlines that are represented as \r\n (also known as CRLF) while a UTF-8 encoded file on a Unix-like OS might end up with just \n (also known as just LF).

This difference is likely to be the cause of your whole-file-diff and can be fixed in different ways:

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
1

These differences usually have at least one of the following reasons:

Source encoding

Java does not have an encoding defined in the source file, so you have to agree with your team members. Usually there's no good reason to choose anything else than UTF-8. If your preferred editor only supports system encoding... choose another one.

Line endings

Git supports a local configuration of LF handling by setting the core.autocrlf config. I strongy recommend not to use it. Make a project wide configuration and place a .gitattributes file in your project root, for example:

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.java text

# Declare files that will always have Unix LF line endings on checkout.
*.sh text eol=lf
Dockerfile text eol=lf

Indentation

A mixture of tabs an whitespaces can really mess up your commit history. Many IDEs do an automatic formatting or pretty printing on every save action, but they all do it slightly different. Agree with your team members on a common source formatting (and be prepared for longer discussions).

If you cannot even agree on tabs or spaces, here's a good argument: Developers Who Use Spaces Make More Money Than Those Who Use Tabs

oliver_t
  • 968
  • 3
  • 10