The key to performance in Bash is to avoid loops in general, and in particular those that call one or more external utilities in each iteration.
Here is a solution that uses a single GNU awk
command:
awk -v RS='\r\n' '
BEGINFILE { outFile=gensub("\\.txt$", "_unix&", 1, FILENAME) }
{ print > outFile }
' /home/cmccabe/Desktop/files/*.txt
-v RS='\r\n'
sets CRLF as the input record separator, and by virtue of leaving ORS
, the output record separator at its default, \n
, simply printing each input line will terminate it with \n
.
- the
BEGINFILE
block is executed every time processing of a new input file starts; in it, gensub()
is used to insert _unix
before the .txt
suffix of the input file at hand to form the output filename.
{print > outFile}
simply prints the \n
-terminated lines to the output file at hand.
Note that use of a multi-char. RS
value, the BEGINFILE
block, and the gensub()
function are GNU extensions to the POSIX standard.
Switching from the OP's sed
solution to a GNU awk
-based one was necessary in order to provide a single-command solution that is both simpler and faster.
Alternatively, here's a solution that relies on dos2unix
for conversion of Window line-endings (for instance, you can install dos2unix
with sudo apt-get install dos2unix
on Debian-based systems); except for requiring dos2unix
, it should work on most platforms (no GNU utilities required):
- It uses a loop only to construct the array of filename arguments to pass to
dos2unix
- this should be fast, given that no call to basename
is involved; Bash-native parameter expansion is used instead.
- then uses a single invocation of
dos2unix
to process all files.
# cd to the target folder, so that the operations below do not need to handle
# path components.
cd '/home/cmccabe/Desktop/files'
# Collect all *.txt filenames in an array.
inFiles=( *.txt )
# Derive output filenames from it, using Bash parameter expansion:
# '%.txt' matches '.txt' at the end of each array element, and replaces it
# with '_unix.txt', effectively inserting '_unix' before the suffix.
outFiles=( "${inFiles[@]/%.txt/_unix.txt}" )
# Create an interleaved array of *input-output filename pairs* to be passed
# to dos2unix later.
# To inspect the resulting array, run `printf '%s\n' "${fileArgs[@]}"`
# You'll see pairs like these:
# file1.txt
# file1_unix.txt
# ...
fileArgs=(); i=0
for inFile in "${inFiles[@]}"; do
fileArgs+=( "$inFile" "${outFiles[i++]}" )
done
# Now, use a *single* invocation of dos2unix, passing all input-output
# filename pairs at once.
dos2unix -q -n "${fileArgs[@]}"