3

I write code on a Windows 10 machine, upload it to remote Linux machines where it's actually run. Typically using a IDE feature like Jetbrains upload or WinSCP. I also do all my version control remotely usually with the following workflow:

(in remote session)
 1. $ git clone git@github.com:myorg/myrepo.git
(in local)
 2. Download from remote: /myrepo -> C://User/Me/myrepo
 3. Edit some_file.py
 4. Upload to remote: C://User/Me/myrepo/some_file.py -> /myrepo/some_file.py
(in remote session)
 5. $ python some_file.py  # ERROR: something about bad chars or line endings with '\r'
 6. $ sed -i 's/\r//' some_file.py; python some_file.py  # WORKS!
 7. $ git add some_file.py; git commit -m "removed bad win char"

This error and my current method of resolution is rather annoying. I tried automating it with the following bash script I included in my $PATH at ~/mytools/remove_win_char.sh

#!/usr/bin/bash

find . -type f -exec sed -i 's/\r//g' {} \;

Unfortunately this has some unintended side effects in git repos: (i.e. this answer does not work)

$ remove_win_char.sh
$ git status
fatal: unknown index entry format 0x2f610000

I tried to fix by specifying only certain files in the script:

find . -name *.py -o -name *.sql -o -name *.sh -exec sed -i 's/\r//g' {} \;

Unfortunately this only seems to hit the .sh files.

Anyone know how to filter .py, .sql, and .sh files only with find? Or know a better way to remove these \r characters created locally by Windows?

Wassadamo
  • 1,176
  • 12
  • 32
  • 2
    Put the filenames with asterisks in single quotes to stop the shell from expanding them: `find . -name '*.py' -o -name '*.sql' ...` – Aplet123 Dec 17 '20 at 02:20
  • 2
    And quote `'{}'` DOS line-endings and UTF-16 are the best way to ruin a good project `:)` (you may also want to add `-type f` just to ensure no stray directories are picked up.) – David C. Rankin Dec 17 '20 at 02:22
  • Does `\r` appear anywhere else besides the `\r\n` CRLF characters on Windows? What about strings "\r"? Can these other cases be avoided? – Wassadamo Dec 17 '20 at 02:53
  • 1
    Does this answer your question? [Why should I use core.autocrlf=true in Git?](https://stackoverflow.com/questions/2825428/why-should-i-use-core-autocrlf-true-in-git) – Daniel Mann Dec 17 '20 at 03:45

1 Answers1

3

Using find and sed may destroy your repository because they are not aware of the git repository, its internals and the way how git treats tracked files. You have to use git ls-files to produce the list of files it tracks as text files with CR/LF line endings and then process the files accordingly:

git ls-files --eol

It produces tabular output like

i/lf    w/lf    attr/                   .gitignore
i/crlf  w/crlf  attr/                   README.md
i/lf    w/lf    attr/                   env/install.sh

that can be filtered using awk (unfortunately, not sure if grep can handle fields) and cut, and then CR/LF-to-LF-fixed using dos2unix.

git -c core.quotepath=off ls-files --eol '*.py' '*.sql' '*.sh' \ # query git
    | awk '$1 ~ /^i\/crlf/' \                                    # filter only lines starting with i/crlf
    | cut -f2 \                                                  # filter files only (see why it is TAB-delimited https://git-scm.com/docs/git-ls-files#_output)
    | xargs -I{} dos2unix {}                                     # convert CR/LF to LF
  • @oguzismail thank you for the edit! I didn't even know that `cut` uses TAB as the default delimiter. – terrorrussia-keeps-killing Dec 17 '20 at 06:21
  • 1
    You're welcome. The reason why I didn't upvote this answer is that it doesn't account for files with special characters in their names. Otherwise I totally agree that relying on git itself is a safer approach than relying on dos2unix's heuristic checks. – oguz ismail Dec 17 '20 at 06:37
  • 1
    @oguzismail Aha, another "didn't know" from my side: I rarely encounter filenames that require octal representation, and mostly saw them in high-level commands like `status` or `show`, so this is why I didn't realize to put that configuration option. Thanks! – terrorrussia-keeps-killing Dec 17 '20 at 06:50
  • Unfortunately this doesn't work for me. I tried running the last block on one line in the root of my repo, but my file EOF's still have `w/crlf` and running them produces the usual error: `run.sh: line 2: $'\r': command not found` – Wassadamo Dec 18 '20 at 01:31
  • @Wassadamo `w/crlf` means that you have CR/LF-lined files in your working copy (not yet staged with `git add`), whilst the script above filters for indexed files only (can be modified of course). Also, I'm not really sure what does "the last block" mean. Could you elaborate please? – terrorrussia-keeps-killing Dec 18 '20 at 10:16