10

I need to make a list of a large number of files (40,000 files) like below:

ERR001268_1_100.fastq  ERR001268_2_156.fastq  ERR001753_2_78.fastq
ERR001268_1_101.fastq  ERR001268_2_157.fastq  ERR001753_2_79.fastq
ERR001268_1_102.fastq  ERR001268_2_158.fastq  ERR001753_2_7.fastq
ERR001268_1_103.fastq  ERR001268_2_159.fastq  ERR001753_2_80.fastq

my command is: ls ERR*_1_*.fastq |sed 's/\.fastq//g'|sort -n > masterlist However error is: bash: /bin/ls: Argument list too long

However can I solve this problem? Any other way to make list like this by perl/python?

thx

Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
LookIntoEast
  • 8,048
  • 18
  • 64
  • 92

4 Answers4

16

You should be able to replace ls ERR*_1_*.fastq with find . -name "ERR*_1_*.fastq".
This way, you can avoid having the wildcard expand into a huge argument list.

(The find output will include a leading "./", e.g. ./ERR001268_1_100.fastq . If that's undesirable, you can get rid of it with another sed command later in the pipeline.)

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
1

If the files already all exist within your directory, python's "glob" module might have a higher limit than bash's command line.

From the command line:

python -c "import glob; print glob.glob('ERR_*_1_*.fastq')"

To do the whole thing in python, the you could try something like this:

import glob
files = glob.glob("ERR_*_1_*.fastq")
trimmedfiles = [x.replace(".fastq","") for x in files]
trimmedfiles.sort()
for f in trimmedfiles:
    print f

This solution will sort the files alphabetically, and not numerically. For that you might want to add some key=lambda magic to the sort() method:

trimmedfiles.sort(key=lambda f: int(f.split("_")[2]))
Ben G.
  • 156
  • 3
  • 1
    You probably want a `'\n'.join(...)` around the glob call. Otherwise, this answer got me out of a similar situation, +1 – quornian Dec 13 '12 at 20:52
0

You can use find.

Example:

find /Users/kunlun/Downloads/fu_neg/ -name "*.png" > 
/Users/kunlun/Downloads/fu_neg.txt
Mehdi
  • 717
  • 1
  • 7
  • 21
logan lu
  • 1
  • 3
0

Find might help you - rather then ls use find . -name 'yourpatternhere' -print0 | xargs -0 youractionhere

j03m
  • 5,195
  • 4
  • 46
  • 50