bash "map" equivalent: run command on each file

Asked Apr 14 '10 at 19:25

Active Aug 16 '15 at 23:34

Viewed 1.3k times

I often have a command that processes one file, and I want to run it on every file in a directory. Is there any built-in way to do this?

For example, say I have a program data which outputs an important number about a file:

./data foo
137
./data bar
42

I want to run it on every file in the directory in some manner like this:

map data `ls *`
ls * | map data

to yield output like this:

foo: 137
bar: 42

asked Apr 14 '10 at 19:25

Claudiu

224,032
165
485
680

12 Answers12

If you are just trying to execute your data program on a bunch of files, the easiest/least complicated way is to use -exec in find.

Say you wanted to execute data on all txt files in the current directory (and subdirectories). This is all you'd need:

find . -name "*.txt" -exec data {} \;

If you wanted to restrict it to the current directory, you could do this:

find . -maxdepth 1 -name "*.txt" -exec data {} \;

There are lots of options with find.

answered Apr 17 '10 at 04:53

Daniel Haley

51,389
6
69
95

1

oh yes, this is actually what i want! ty – Claudiu Apr 17 '10 at 20:59

If you just want to run a command on every file you can do this:

for i in *; do data "$i"; done

If you also wish to display the filename that it is currently working on then you could use this:

for i in *; do echo -n "$i: "; data "$i"; done

edited Apr 14 '10 at 19:47

answered Apr 14 '10 at 19:27

Mark Byers

811,555
193
1,581
1,452

1

With the caveat of quoting `$i` so that files with spaces in their names don't get treated as multiple arguments to whatever program is being called – Daniel DiPaolo Apr 14 '10 at 19:29
1

You can get away with a simple for loop in this case because the `ls` can be turned into glob expansion. If you actually want to use the output of a command, the for loop will split on all embedded whitespace, so you'll probably want to set `$IFS` to only newlines - see my answer if that's necessary. – Cascabel Apr 14 '10 at 19:32

It looks like you want xargs:

find . --maxdepth 1 | xargs -d'\n' data

To print each command first, it gets a little more complex:

find . --maxdepth 1 | xargs -d'\n' -I {} bash -c "echo {}; data {}"

edited Apr 15 '10 at 15:08

answered Apr 14 '10 at 19:28

Stephen

47,994
7
61
70

ah nice, most concise one so far. is there nay easy way to also print the file its currently working on? – Claudiu Apr 14 '10 at 19:35
2

ls is not supposed to be used that way. Instead, ls is intended to present a listing to the user: it may replace unprintable characters, reformat the listing, etc. – Juliano Apr 15 '10 at 13:45
@Juliano, fair enough. switched to find. – Stephen Apr 15 '10 at 15:09

You should avoid parsing ls:

find . -maxdepth 1 | while read -r file; do do_something_with "$file"; done

while read -r file; do do_something_with "$file"; done < <(find . -maxdepth 1)

The latter doesn't create a subshell out of the while loop.

answered Apr 14 '10 at 23:01

Dennis Williamson

346,391
90
374
439

The common methods are:

ls * | while read file; do data "$file"; done

for file in *; do data "$file"; done

The second can run into problems if you have whitespace in filenames; in that case you'd probably want to make sure it runs in a subshell, and set IFS:

( IFS=$'\n'; for file in *; do data "$file"; done )

You can easily wrap the first one up in a script:

#!/bin/bash
# map.bash

while read file; do
    "$1" "$file"
done

which can be executed as you requested - just be careful never to accidentally execute anything dumb with it. The benefit of using a looping construct is that you can easily place multiple commands inside it as part of a one-liner, unlike xargs where you'll have to place them in an executable script for it to run.

Of course, you can also just use the utility xargs:

find -maxdepth 0 * | xargs -n 1 data

Note that you should make sure indicators are turned off (ls --indicator-style=none) if you normally use them, or the @ appended to symlinks will turn them into nonexistent filenames.

edited Dec 11 '12 at 23:22

answered Apr 14 '10 at 19:30

Cascabel

479,068
72
370
318

use `for file in *` instead of `for file in $(ls *)` – glenn jackman Apr 14 '10 at 23:33
@glenn jackman: I realize that, and it was covered in another answer. I was attempting to provide the general answer here, because it's not always simple globbing that gets you your filename list. It can be `grep -l`, `find ...`, who knows. – Cascabel Apr 15 '10 at 06:12
You are correct on `grep -l` and `find`, I will +1 if you replace them in your answer. You can parse safely their well-defined output. `ls` has undocumented output format and thus is a different story. – ignis Dec 10 '12 at 19:48

GNU Parallel specializes in making these kind of mappings:

parallel data ::: *

It will run one job on each CPU core in parallel.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

answered Apr 29 '15 at 20:54

Ole Tange

31,768
5
86
104

Since you specifically asked about this in terms of "map", I thought I'd share this function I have in my personal shell library:

# map_lines: evaluate a command for each line of input
map_lines()
{
        while read line ; do
                $1 $line
        done
}

I use this in the manner that you for a solution:

$ ls | map_lines ./data

I named it map_lines instead of map as I assumed some day I may implement a map_args where you would use it like this:

$ map_args ./data *

That function would look like this:

map_args()
{
    cmd="$1" ; shift
    for arg ; do
        $cmd "$arg"
    done
}

answered Apr 15 '10 at 13:39

camh

40,988
13
62
70

Try this:

for i in *; do echo ${i}: `data $i`; done

answered Apr 14 '10 at 19:29

Juha Syrjälä

33,425
31
131
183

You can create a shell script like so:

#!/bin/bash
cd /path/to/your/dir
for file in `dir -d *` ; do
  ./data "$file"
done

That loops through every file in /path/to/your/dir and runs your "data" script on it. Be sure to chmod the above script so that it is executable.

answered Apr 14 '10 at 19:30

Banjer

8,118
5
46
61

You could also use PRLL.

answered Apr 14 '10 at 19:34

raspi

5,962
3
34
51

ls doesn't handle blanks, linefeeds and other funky stuff in filenames and should be avoided where possible.

find is only useful if you like to dive into subdirs, or if you want to make usage from the other options (mtime, size, you name it).

But many commands handle multiple files themself, so don't need a for-loop:

for d in * ; do du -s $d; done

but

du -s *
md5sum e* 
identify *jpg
grep bash ../*.sh

answered Apr 15 '10 at 03:16

user unknown

35,537
11
75
121

I have just written this script specifically to address the same need.

http://gist.github.com/kindaro/4ba601d19f09331750bd

It uses find to build a set of files to transpose, which allows for finer selection of files to map from but allows a window for harder mistakes as well.

I designed two modes of operation: the first mode runs a command with "source file" and "target file" arguments, while the second mode supplies source file contents to a command as stdin and writes its stdout into a target file.

We may further consider adding support for parallel execution and maybe limiting the set of custom find arguments to a few most necessary ones. I am not really sure if that's the right things to do.

answered Aug 16 '15 at 23:34

Ignat Insarov

4,660
18
37