167

I want to search for files containing DOS line endings with grep on Linux. Something like this:

grep -IUr --color '\r\n' .

The above seems to match for literal rn which is not what is desired.

The output of this will be piped through xargs into todos to convert crlf to lf like this

grep -IUrl --color '^M' . | xargs -ifile fromdos 'file'
pjz
  • 41,842
  • 6
  • 48
  • 60
Tim Abell
  • 11,186
  • 8
  • 79
  • 110
  • 2
    Have you tried [dos2unix](http://linux.die.net/man/1/dos2unix)? It fixes line endings automatically. – sblundy Sep 16 '08 at 15:54
  • I'm not quite sure but iirc there's a difference between quoting the pattern inside ' and ". Afaik in patterns enclosed in ' the escape sequences are interpreted as proper string so '\r' would be equivalent to "\\r" and "\r" has no equivalent (at least in that notation) with '. – Anticom Oct 16 '14 at 13:10
  • Anticom: You're correct in this case that the difference between ' and " is irrelevant; however, generally they are distinct as ' surrounded strings are weak quoted, and " are strong quoted. The biggest thing I take advantage of is that $ expansions or `` don't expand in weak quoted strings. See [bash-hackers on quoting](http://wiki.bash-hackers.org/syntax/quoting) for more. – bschlueter Jan 06 '15 at 17:15
  • 4
    Easiest way is to use modern `dos2unix` with `-ic` switch. For LF files you may search with unix2dos`-ic`. It doesn't modify files. Only report. – gavenkoa Feb 21 '17 at 09:50
  • It doesn't answer the question exactly but dos2unix **is the easiest way** to fix this. – jcollum Sep 19 '17 at 22:40
  • 4
    since this is a top answer for any question regarding Windows line endings/carriage returns on Linux, I think its worth noting that you can *see* them in the terminal with the command `cat -v somefile.txt`; they show up as `^M` – user5359531 Oct 15 '18 at 15:28

9 Answers9

216

grep probably isn't the tool you want for this. It will print a line for every matching line in every file. Unless you want to, say, run todos 10 times on a 10 line file, grep isn't the best way to go about it. Using find to run file on every file in the tree then grepping through that for "CRLF" will get you one line of output for each file which has dos style line endings:

find . -not -type d -exec file "{}" ";" | grep CRLF

will get you something like:

./1/dos1.txt: ASCII text, with CRLF line terminators
./2/dos2.txt: ASCII text, with CRLF line terminators
./dos.txt: ASCII text, with CRLF line terminators
Thomee
  • 4,512
  • 2
  • 21
  • 12
  • I'd already cracked this, but thanks anyway. `grep -IUrl --color '^M' . | xargs -ifile fromdos 'file'` – Tim Abell Sep 16 '08 at 16:15
  • 6
    The -l option to grep tells it to just list files (once) instead of listing the matches in each file. – pjz Sep 19 '08 at 12:40
  • 17
    Not a good solution, to depend on that (undocumented, oriented to human consumption) behaviour of `file` program. This is very fragile. For (just one) example: it doesn't work with XML files, `file` reports `XML document text` regardless of newlines type. – leonbloy Nov 28 '13 at 17:59
  • Refering to my previous comment, to make it somewhat more robust one can add an option `file -M /dev/null` to tell `file` not to look into magic bytes. – leonbloy Nov 29 '13 at 17:05
  • 2
    @leonbloy, the option seems to be a lowercase `-m /dev/null` on my `find (GNU findutils) 4.4.2` (Ubuntu 12.04). – EarlCrapstone Aug 11 '14 at 16:41
  • 10
    I like this answer the best. I simply did ``find . -type f | xargs file | grep CRLF`` – brianz Jan 16 '15 at 17:39
  • Interesting, 'no idea `grep` had that option. Useful for checking streams or individual files, but *incredibly* slow when combined with `find`. 'Better to use @StevenPenny's, lightning quick on the recursive search. – Hashbrown Oct 29 '20 at 04:49
  • @brianz `xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option` – Hashbrown Oct 29 '20 at 04:49
  • Best answer! Even it's pretty slow, this gives me the correct list of files! – jaques-sam Oct 14 '21 at 08:05
  • It's not a good solution, as @leonbloy wrote. Another example: `file main.js` returns `main.js: AppleWorks Word Processor 0x3d, tabstop ruler "DEL OF", zoomed, paginated, with mail merge, 32/10 inch left margin` (without writing any "CRLF" text, so at the end that case becomes another false negative). Note: `cat -A main.js` shows a ^M at the end of each line of that file. – Ganton Mar 27 '22 at 10:26
  • if you want to convert these files to use LF instead of CRLF : dos2unix `find . -not -type d -exec file "{}" ";" | grep CRLF | cut -d ":" -f1` – kingsjester Jan 25 '23 at 14:49
141

Use Ctrl+V, Ctrl+M to enter a literal Carriage Return character into your grep string. So:

grep -IUr --color "^M"

will work - if the ^M there is a literal CR that you input as I suggested.

If you want the list of files, you want to add the -l option as well.

Explanation

  • -I ignore binary files
  • -U prevents grep from stripping CR characters. By default it does this it if it decides it's a text file.
  • -r read all files under each directory recursively.
pjz
  • 41,842
  • 6
  • 48
  • 60
  • 4
    As a quick hack that would work but I think human readbale solution would be: grep $'\r' /bash shell only/ or grep `printf '\r'` – akostadinov Jun 04 '12 at 12:25
  • 9
    @akostadinov +1, But backticks got interpreted out of your comment ;) The second option would, in other words, be **`grep $(printf '\r')`**. But for most practical uses involving bash, I would stick with `$'\r'`. – jankes Nov 12 '12 at 15:53
  • 3
    Note: The option `-U` is only relevant for Windows (or cygwin), but it's critical there. On Windows, the command will not work without it. – sleske Jul 29 '13 at 09:22
  • 3
    What is the point of option `-I`? By the manual, it seems to me that binary files are considered as non-matching. Shouldn't the combination of `-I` and `-U` (which enforce the binary type) result in all files being considered as non-matching? – Jānis Elmeris Dec 10 '13 at 10:00
  • 4
    You mention the '-l' flag as an add-on option, but I think it should be included in the primary answer because the question essentially asks for a list of files. Also, it results in a faster search. – arr_sea Apr 27 '15 at 19:24
  • @arr_sea and it has the benefit that you actually see something when stdout is a tty. Without the `-l` the trailing CRs just wipe out the lines to the beginning. – Guido Flohr Oct 25 '19 at 11:29
  • On Unix, `-U` seems to cancel the `-I`. I used `grep -Ilr $'\r' .` (using `-l`, as I only wanted filenames). – big_m Jun 24 '20 at 20:14
  • This prints files that start with an M on the first line... – jaques-sam Oct 14 '21 at 08:01
  • The `"^M"` gets interpreted as a regex, matching on `M` characters at the start of a line. – detuur Jun 21 '22 at 15:23
  • If you enter the string as `^M`, it will match `M` at the start of a line. However that's not what the answer is suggesting: the answer is suggesting entering the search string as `Ctrl`+`V`, `Ctrl`+`M`, which (at least on my terminal emulator) will enter a carriage return control character; it looks identical to a separate `^` and `M`, but that's just how the control character gets rendered. – me_and Sep 30 '22 at 14:05
67

Using RipGrep (depending on your shell, you might need to quote the last argument):

rg -l \r
-l, --files-with-matches
Only print the paths with at least one match.

https://github.com/BurntSushi/ripgrep

Zombo
  • 1
  • 62
  • 391
  • 407
  • 3
    This doesn't seem to give the correct answer, it gives me all files – jaques-sam Oct 14 '21 at 08:03
  • @mcarton: your edit was reverted in this post. Since this contributor is now permanently banned from the network, you are at liberty to restore your version if you wish (it's not my area of expertise, so I will leave it to you/others). – halfer May 22 '22 at 20:05
19

If your version of grep supports -P (--perl-regexp) option, then

grep -lUP '\r$'

could be used.

sleske
  • 81,358
  • 34
  • 189
  • 227
Linulin
  • 4,050
  • 1
  • 19
  • 13
15
# list files containing dos line endings (CRLF)

cr="$(printf "\r")"    # alternative to ctrl-V ctrl-M

grep -Ilsr "${cr}$" . 

grep -Ilsr $'\r$' .   # yet another & even shorter alternative
yabt
  • 159
  • 1
  • 2
6

dos2unix has a file information option which can be used to show the files that would be converted:

dos2unix -ic /path/to/file

To do that recursively you can use bash’s globstar option, which for the current shell is enabled with shopt -s globstar:

dos2unix -ic **      # all files recursively
dos2unix -ic **/file # files called “file” recursively

Alternatively you can use find for that:

find -type f -exec dos2unix -ic {} +            # all files recursively (ignoring directories)
find -name file -exec dos2unix -ic {} + # files called “file” recursively
Gerold Meisinger
  • 4,500
  • 5
  • 26
  • 33
dessert
  • 248
  • 6
  • 15
3

You can use file command in unix. It gives you the character encoding of the file along with line terminators.

$ file myfile
myfile: ISO-8859 text, with CRLF line terminators
$ file myfile | grep -ow CRLF
CRLF  
2

The query was search... I have a similar issue... somebody submitted mixed line endings into the version control, so now we have a bunch of files with 0x0d 0x0d 0x0a line endings. Note that

grep -P '\x0d\x0a'

finds all lines, whereas

grep -P '\x0d\x0d\x0a'

and

grep -P '\x0d\x0d'

finds no lines so there may be something "else" going on inside grep when it comes to line ending patterns... unfortunately for me!

Zombo
  • 1
  • 62
  • 391
  • 407
Peter Y
  • 220
  • 2
  • 3
1

If, like me, your minimalist unix doesn't include niceties like the file command, and backslashes in your grep expressions just don't cooperate, try this:

$ for file in `find . -type f` ; do
> dump $file | cut -c9-50 | egrep -m1 -q ' 0d| 0d'
> if [ $? -eq 0 ] ; then echo $file ; fi
> done

Modifications you may want to make to the above include:

  • tweak the find command to locate only the files you want to scan
  • change the dump command to od or whatever file dump utility you have
  • confirm that the cut command includes both a leading and trailing space as well as just the hexadecimal character output from the dump utility
  • limit the dump output to the first 1000 characters or so for efficiency

For example, something like this may work for you using od instead of dump:

 od -t x2 -N 1000 $file | cut -c8- | egrep -m1 -q ' 0d| 0d|0d$'
MykennaC
  • 641
  • 7
  • 18