2

I'm not so experienced in bash scripting, so consider studying it on practice. Recently i was trying to make simple script which should reveal all files at least 1 GB sized and faced with problem escaping white-spaces in names. It's working fine in terminal if i do:

$ find /home/dem -size +1000M -print|sed -e 's/ /\\ /'
/home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar
/home/dem/VirtualBox\ VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox\ VMs/Win7/Win7-test.vdi
/home/dem/VirtualBox\ VMs/FreeBSD9.1/FreeBSD9.1.vdi
/home/dem/VirtualBox\ VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox\ VMs/Beini-1.2.3/Beini-1.2.3.vdi
/home/dem/VirtualBox\ VMs/BackTrack5RC3/BackTrack5RC3.vdi
/home/dem/VirtualBox\ VMs/WinXPx32/WinXPx32.vdi

But in this script:

#!/bin/bash

for i in "$( find /home/dem -size +1000M -print|sed -e 's/ /\\ /' )"
 do 
  res="$( ls -lh $i )"
  echo $res
done 

It gives error, and as you may see left part stripped:

ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Win7/Win7-test.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/FreeBSD9.1/FreeBSD9.1.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Beini-1.2.3/Beini-1.2.3.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/BackTrack5RC3/BackTrack5RC3.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/WinXPx32/WinXPx32.vdi: No such file or directory
-rw-rw-r-- 1 dem dem 3.1G Jul 13 02:54 /home/dem/Downloads/BT5R3-GNOME-32/BT5R3-GNOME-32.iso -rw------- 1 dem dem 1.1G Dec 27 2012 /home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar

I need script to show files with white-spaces + retrieving actual size of each file which ls -lh do. Without sed formatting:

$ find /home/dem -size +1000M -print
/home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar
/home/dem/VirtualBox VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox VMs/Win7/Win7-test.vdi
/home/dem/VirtualBox VMs/FreeBSD9.1/FreeBSD9.1.vdi
/home/dem/VirtualBox VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox VMs/Beini-1.2.3/Beini-1.2.3.vdi
/home/dem/VirtualBox VMs/BackTrack5RC3/BackTrack5RC3.vdi
/home/dem/VirtualBox VMs/WinXPx32/WinXPx32.vdi
Demontager
  • 217
  • 1
  • 5
  • 12

3 Answers3

3

xargs is great for simple cases, though it needs -0 (NUL-delimited inputs) to behave correctly when handling filenames with newlines in their paths (which are legal on UNIX). If you really do need to read the filenames into a shell script, you can do it like so:

while IFS='' read -r -d '' filename; do
  ls -lh "$filename"
done < <(find /home/dem -size +1000M -print0)

...or like so, using functionality in modern versions of the POSIX standard for find to duplicate the behavior of xargs:

find /home/dem -size +1000M -exec ls -lh '{}' +
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • About supplied options to while read - IFS='' i assume it is field separator. As print0 produce NULL breaks, then it obvious. And what about -r and -d not clear for me. – Demontager Aug 16 '13 at 14:21
  • @user1940704 `-d ''` indicates that the NUL character, not the newline, separates records. `IFS=''` means that there are no _field_ separator characters (as opposed to record separators), which prevents trailing whitespace from being stripped. `-r` prevents backslash escape sequences from being processed by read (and really should be what folks use by default everywhere, unless they're very sure that they _do_ want read to interpret escape sequences for them). – Charles Duffy Aug 16 '13 at 17:39
2

Simply use xargs:

find /home/dem -size +1000M -print0 | xargs -0 ls -lh
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • If i use find /home/dem -size +1000M -print0 | xargs -0 ls -lh in bash script it gives ugly output (without break lines), sure it works fine while executing directly in terminal. – Demontager Aug 12 '13 at 06:39
  • Worked just fine when I ran it from a script. – Ansgar Wiechers Aug 12 '13 at 06:45
  • Charles Duffy answered my question how it safe to use in bash by while read loop. Only I want to know about supplied options to while read - IFS='' i assume it is field separator. As print0 produce NULL breaks, then it obvious. And what about -r and -d not clear for me. – Demontager Aug 12 '13 at 06:46
  • Why are you asking me instead of the person who provided that answer? – Ansgar Wiechers Aug 12 '13 at 06:58
  • Yeap, sorry asked. I was running above mentioned command without quotes, that why output was unformatted. Yes, it could be other option to use instead of while loop. Thanks, Ansgar – Demontager Aug 16 '13 at 14:24
2

In shell script, parameters are divided by white space and can be troublesome if you are looking for file names that contain white spaces. This is a problem when you use a for loop because the for loop will treat each white space as a parameter separator:

$ ls -l
this is file number one
this is file number two

$ for file in $(find . -type f)
> do
>     echo "My file is '$file'"
> done
my file is 'this'
my file is 'is'
my file is 'file'
my file is 'number'
my file is 'one'
my file is 'this'
my file is 'is'
my file is 'file'
my file is 'number'
my file is 'two'

In this case, the for is treating each space as a separate file which is what you don't want. There are other issues with for:

  • The for loop cannot start until it finishes processing the command in the $(...).
  • It is possible to overrun your command line buffer. What the shell does is execute the command in $(...) and the replaces the $(...) with the results of that command. If you used a find command that returned a few hundred thousand files, you will probably overrun your command line buffer. Even worse, it will happen silently. Unless you take a look you will never know that files were dropped. In fact, I've seen where someone tests a shell script using this type of for ... $(...) loop thinks everything is great, but then the command fails in a very critical situation.
  • It is inefficient because it has to spawn a separate shell process. Okay, it's not that big a deal anymore, but still...

A better way to handle this is to use a while read loop. IN BASH, it would look like this:

find ... -print0 | while read -d $'\0' file
do
   ....
done

The -print0 parameter prints out all found files, but separates them with a NULL character. The while read -d\$0 ... syntax breaks the parameter names on the NULL character and not on new lines as it normally does. Thus, even if your files have new lines in them (and file names are allowed in Unix to contain new lines, the while read -d\$0... will still read your file names properly.

Even better, this solves a few other problems:

  • The command line buffer can't be overloaded.
  • Your while read loop will execute in parallel with the find. No need for the find to find all of your files first.
  • You're not spawning a separate process.

Observe:

$ ls -l
this is file number one
this is file number two

$ find . -type f -print0 | while read -d\$0 file
>     echo "My file is '$file'"
> done
my file is 'this is file number one'
my file is 'this is file number two'

By the way, another command called xargs has a similar parameter:

find . -type f -mtime +100 -print0 | xargs -0 rm

The xargs command takes the file names from STDIN, and passes them to the command it is given. It guarantees that the parameters passed will not over run the command line buffer. If they do, xargs will run the command passed to it multiple times.

Normally, (like for) xargs parses file names on whitespace. However, you can pass it a paramter to parse names on nulls.

THIS PARAMETER DIFFERS FROM SYSTEM TO SYSTEM

Sorry for the shouting, but I need to make this very clear. Different systems have different parameters for the xargs command, and you need to refer to the manpage to see which parameter your system takes. On my Mac, it is the -0. On GNU, it is --null although some Linux distributions take -0 too. And, some Unix versions may not even have this parameter.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • No GNU system takes `--null` but not `-0`. – Charles Duffy Aug 11 '13 at 03:55
  • 1
    ...also, `-d\$0` is _completely_ different from `-d $'\0'` (which is a method some folks prefer to write the functional equivalent `-d ''`, as `$'\0'` can't be represented in a NUL-terminated string). `-d\$0` is the same as `-d'$'`, which expects dollar-sign delimiters, not NUL delimiters. – Charles Duffy Aug 11 '13 at 03:56
  • @CharlesDuffy you got me with the `-d $'\0'`. My mistake. I'm going to fix it in my answer. However, according to the [GNU documentation](http://www.gnu.org/software/findutils/manual/html_mono/find.html#xargs-options), xargs will take either `--null` or `-0` – David W. Aug 11 '13 at 19:23
  • ...which is my point; GNU xargs takes either, and introduced both at the same time, so the claim that some versions take only `--null` and others take only `-0` is, as yet, unsubstantiated. – Charles Duffy Aug 11 '13 at 19:55
  • @CharlesDuffy You said _GNU system takes --null but not -0_. – David W. Aug 12 '13 at 01:09
  • Two grammatical errors would be necessary for that interpretation: An elided comma, and incorrect pluralization. (It would need to be "No**,** GNU system**s** take[...]" to be correct English). Isn't it clearer to read things as they're actually written? – Charles Duffy Aug 12 '13 at 12:37