9

I am pretty new at shell scripting and I have been struggling all day to figure out how to perform a "for" command. Essentially, what I am trying to do is the following:

I have a list.txt file with a bunch of names:

name1
name2
name3

for every name in the list, there are two different files, each with a different ending to the name. Ex:

name1_R1
name1_R2

The program I am trying to run is called sickle. Basically, it takes two files (that correspond to each other) and runs an analysis on them, hence requiring me to have this naming scheme. The sickle command is as follow:

sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \

If someone could help me out, at least just by telling me how to get unix to read the list of files and treat each line independently I think I could go from there. I tried a few things, but none of them worked.

Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
user2647734
  • 127
  • 1
  • 1
  • 5
  • 2
    Welcome to Stack Overflow. Please read the [About] page soon. Your example command line doesn't bear much relation to the names you've listed in your file or generated from `name1`, which makes it hard to guess what you're really wanting to see. Consistency in writing your question makes it easier to give you a useful answer. Please show the exact command line you want generated for the file base `name1`. What is the significance of the trailing backslash? Also, it is a good idea to show some of what you've tried rather than just abstractly claim they don't work. – Jonathan Leffler Aug 03 '13 at 02:34
  • See also [Looping over pairs of values in bash](/questions/28725333/looping-over-pairs-of-values-in-bash) – tripleee Apr 13 '19 at 07:02

3 Answers3

17

There are a couple of ways to do it. Since the names are 'one per line' in the data file, we can assume there are no newlines in the file names.

for loop

for file in $(<list.txt)
do
    sickle pe -f "${file}_file1.fastq" -r "${file}_file2.fastq" -t sanger
done

while loop with read

while read file
do
    sickle pe -f "${file}_file1.fastq" -r "${file}_file2.fastq" -t sanger
done < list.txt

The for loop only works if there are no blanks in the names (nor other white-space characters such as tabs). The while loop is clean as long as you don't have newlines in the names, though using while read -r file would give you even better protection against the unexpected. The double quotes around the file name in the for loop are decorative (but harmless) because the file names cannot contain blanks, but those in the while loop prevent file names containing blanks from being split when they should not be split. It's often a good idea to quote variables every time you use them, though it strictly only matters when the variable might contain blanks but you don't want the value split up.

I've had to guess what names should be passed to the sickle command since your question is not clear about it — I'm 99% sure I've guessed wrong, but it matches the different suffixes in your sample command assuming the base name of file is input. I've omitted the trailing backslash; it is the 'escape' character and it is not clear what you really want there.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    worked like a charm. Thanks a lot! Yah I didn't want to put the name sin the list because I thought it would be simpler to do this way. – user2647734 Aug 03 '13 at 04:10
  • 1
    With http://mywiki.wooledge.org/DontReadLinesWithFor the `while` loop should really be the first suggestion here. – tripleee Jun 30 '15 at 04:31
4

Use a Bash For-Loop

Bash has a very reasonable for-loop as one of its looping constructs. You can replace the echo command below with whatever custom command you want. For example:

for file in name1 name2 name3; do
  echo "${file}_R1" "${file}_R2"
done

The idea is that the loop assigns each filename to the file variable, then you append the _R1 and _R2 suffixes to them. Note that quoting may be important, and does no harm if it isn't needed, so you ought to use it as a defensive programming measure.

Use xargs for Argument Lists

If you want to read from a file instead of using the for-loop directly, you can use Bash's read builtin, but xargs is often more portable across shells. For example, the following uses flags available in the version of xargs from GNU findutils to read in arguments from a file and then append a suffix to each of them:

$ xargs --arg-file=list.txt --max-args=1 -I{} /bin/echo "{}_R1" "{}_R2"
name1_R1 name1_R2
name2_R1 name2_R2
name3_R1 name3_R2

Again, you can replace "echo" with the command line of your choice.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • Relying on GNU xargs is often less portable than just relying on Bash. A lot of systems ship with Bash (even if your shell isn't), but much fewer systems ship with the GNU utilities (for example, Solaris, OS X, BSDs, ...) – nneonneo Aug 03 '13 at 03:18
  • If your version of `xargs` has no `--arg-file` option you can also just redirect the file to `STDIN`: `xargs -I{} /bin/echo "{}_R1" "{}_R2" < list.txt` – mschilli Aug 30 '13 at 08:26
3

Use a while loop with read:

while read fn; do
    <command> "${fn}_R1" "${fn}_R2"
done < list.txt
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • This has the same problem your original version with `\`cat fn\`` did: filenames with spaces won't work, because command would receive too many args. Try instead `"${fn}_R1"`, `"${fn}_R2"`. – amalloy Aug 03 '13 at 02:40
  • Thanks, amended. I used to write scripts without making them space-safe, but recently have tried to learn space-safe ways of doing things. Still adjusting ;) – nneonneo Aug 03 '13 at 02:41
  • You don't even need to specify *fn*. Bash automatically assigns to *REPLY*, which is generally sufficient unless you are assigning to more than one variable with each read. – Todd A. Jacobs Aug 03 '13 at 02:50
  • @CodeGnome: I like being explicit about my variable names. Writing `while read; do $REPLY` is just a bit too 'magical' for my tastes. – nneonneo Aug 03 '13 at 03:19