1

What I'm attempting to do:

  • Line 1: find any .txt or .TXT file and pipe them into next command
  • Line 2: convert the .txt file to unix format (get rid of Windows line endings)
  • Line 3: delete the last line of the file, which is always blank
find "${TEMPDIR}" -name *.[Tt][Xx][Tt] | /
xargs dos2unix -k | /
dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )

I can't pipe the (EDIT output) filename of xargs dos2unix -k | / into the third line, I get the following error:

stat: cannot stat '': No such file or directory
tail: cannot open '' for reading: No such file or directory
dd: failed to open '': No such file or directory

Clearly Iv'e wrongly assumed that "$_" will be enough to pass the output through the pipe.

How can I pipe the output (a text file) from xargs dos2unix -k into the third line, dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )?

The solution for line 3 comes from an answer to another question on SO about removing the last line from a file, with this answer in particular being touted as a good solution for large files: https://stackoverflow.com/a/17794626/893766

Community
  • 1
  • 1
Adrian Torrie
  • 2,795
  • 3
  • 40
  • 61
  • 1
    The `dos2unix` program doesn't produce any output, so neither will `xargs dos2unix -k`. It just converts the files silently. – psmears Jun 09 '15 at 09:33
  • 1
    You can also use `find -iname` for case-insesitive search – ColOfAbRiX Jun 09 '15 at 09:36
  • Just don't pipe it, it edits the files in place. Use `;` instead and then continue what your doing. – 123 Jun 09 '15 at 09:37
  • Thanks for `-iname` flag – Adrian Torrie Jun 09 '15 at 09:51
  • 1
    The `dd` command needs to run on the file you are attempting to modify -- the question you link to specifically cautions about this. Thus, it cannot be run at the end of a pipeline, because there is no file name to edit. – tripleee Jun 09 '15 at 10:16
  • 1
    The slashes after the pipe characters are syntax errors. You probably meant to use backslashes, but they aren't necessary here, either -- the shell knows that a pipeline followed by nothing is a multi-line command which continues on the next line. – tripleee Jun 09 '15 at 10:37
  • I had a nasty bug in my shell script that fixes broken links in `m3u` files. It turns out the m3u files on my computer that predate me switching from Mac to Windows were using the Windows carriage returns. The hard part is - most modern Mac text editors are smart and hide the special characters Windows uses. – Sridhar Sarnobat Jul 25 '18 at 05:31

3 Answers3

5

Can this help?

find "${TEMPDIR}" -iname '*.txt' -exec dos2unix "{}" \; -exec sed -i '$d' "{}" \;
anishsane
  • 20,270
  • 5
  • 40
  • 73
  • Wow, I didn't know you could supply multiple `-exec` arguments to `find`. – tripleee Jun 09 '15 at 10:12
  • 1
    This is fine, provided that you drop the quotes around `{}`. :) – lcd047 Jun 09 '15 at 10:14
  • ^^ I don't understand. Having quotes around `{}` is wrong, or unnecessary? I tested with a simple `echo` as command & it gave expected results... `find . -exec echo "#{}#" \;` – anishsane Jun 09 '15 at 10:24
  • @tripleee: I didn't know either. Earlier I had combined `-exec` with `-print0` for pipe-lining. So I tried this & it worked. – anishsane Jun 09 '15 at 10:26
  • `find` already does any necessary escaping of the file name when you pass it with `-exec {}` (or rather, it is not exposed to the shell at all, so no escaping is necessary or useful). But the quotes are consumed by the shell, so they are basically harmless here. – tripleee Jun 09 '15 at 10:29
  • 2
    @tripleee: But putting quotes around `{}` doesn't do anything to the *filename*, quoting or otherwise - It won't even reach the `find` process. The quotes just "protect" the `{}` arriving at the `find` process - of course, no such protection is needed, but nor is any harm done :) – psmears Jun 09 '15 at 10:33
  • @psmears Quite so. "Basically harmless here." – tripleee Jun 09 '15 at 10:48
2

You can try to substitute dos2unix with an explicit replace:

find "${TEMPDIR}" -iname '*.txt' -exec cat {} \; |
tr -d '\r' |
...

As the windows for new line is \r\n you remove all the occurrences of \r with the command tr.

About the find command you can use the option -iname for case-insensitive search and the -exec to run a command.

ColOfAbRiX
  • 1,039
  • 1
  • 13
  • 27
  • I'm not sure, but if you experience issues you can use different commands. perl surely can handle code pages. Have a look a this page http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/ – ColOfAbRiX Jun 09 '15 at 09:54
1

If the file is really big, you are already messing up the efficiency by rewriting it with tr. Then, you are reading it a second time with tail just to get the index of the last line.

The least inefficient fix I can come up with is to replace dos2unix and dd with just one command which performs both functions, so you only read and write the output file once.

find "$TMPDIR" -iname '*.txt' -exec perl -i -ne '
    print $line if defined $line; ($line = $_) =~ s/\015$//' {} \;

Your attempt to use $_ for the current file name doesn't work. The value of $_ is the last file name used by the previous completed command; but in the middle of a pipeline, nothing is yet completed. One possible workaround (which I include only for illustration, not as a recommended solution) would be to run everything in xargs where you have access to {}, similarly to how it works in find -exec.

find "$TMPDIR" -iname '*.txt' -print0 |
xargs -r0 sh -c 'dos2unix -k "{}"
    if=/dev/null of="{}" bs=1 seek=$(
        echo $(stat --format=%s "{}" ) - $( tail -n1 "{}" | wc -c) | bc)

I added -print0 and the corresponding xargs -0 as well as xargs -r as illustrations of good form; though the zero-terminated text format is a GNU find extension not generally found on other platforms.

(Privately, I would probably also replace the seek calculation with a simple Awk script, rather than expend three processes on performing a subtraction.)

tripleee
  • 175,061
  • 34
  • 275
  • 318