0

I have nearly 30,000 .pdb files that are to be passed as input to a C program(linux envi) and get the desired output.Due to some confidentiality problem i am not able to share the program. May anyone please give some suggestions as how to get output by giving all the files at a time.

NOTE: I am able to execute by giving single file, i.e.,

./file_name x.pdb

Since input files are in thousands, it is not possible to run each file individually.

STF
  • 1,485
  • 3
  • 19
  • 36
Laxmi
  • 91
  • 1
  • 1
  • 8
  • Sorry I don't get your problem. – ckruczek Mar 21 '16 at 06:38
  • 4
    `./filename *.pdb`? Or `for a in \`ls -1 *.pdb\`; do ./filename $a; done`? – M Oehm Mar 21 '16 at 06:40
  • 8
    Maybe you could generate a list of all files to process into a single text file, and then put the path to this single file as a single argument of your program. – jdarthenay Mar 21 '16 at 06:40
  • 2
    You can write a code that will read the folder of all the files and will send each one of them to the C program – STF Mar 21 '16 at 06:40
  • @MOehm: The number of files to expand to might be limited by the shell. – alk Mar 21 '16 at 06:43
  • Well, you could read a wildcard specification from argv[1], then run a popen() with the constructed command "ls ..." using that argv[1] spec, and then just munge through all the resulting files. – John Forkosh Mar 21 '16 at 06:43
  • 1
    @alk: Right. Then write a shell script that treats the files in manageable chunks. – M Oehm Mar 21 '16 at 06:44
  • 3
    What keeps you from listing all file names in a single file and pass the latter. Then open it inside side program an read the names line by line? – alk Mar 21 '16 at 06:44
  • Use opendir and readdir functions to open and read the directory contents and filter the files which you need to be processed (filter for .pbd files) and use them in your code – Ash Mar 21 '16 at 07:13
  • 2
    There are definitely many ways of doing this, so I'm VTC as too broad.. – Martin James Mar 21 '16 at 07:43

2 Answers2

0

@"M Ohem" is right. You can use: ./filename *.pdb.

But you need to keep in mind that command line length limit differs from system to system link. It can be checked as: xargs --show-limits.

Output on my Ubuntu 14.04

xargs --show-limits
Your environment variables take up 3140 bytes
POSIX upper limit on argument length (this system): 2091964
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2088824
Size of command buffer we are actually using: 131072

So, if cumulative sum of file name length of 30,000 pdb files(you need to consider spaces) can be accommodated in your systems command line length limit you are good to go with mentioned solution.

Otherwise, if you have flexibility to pass each file independently to you binary then you can write a separate script where you can call your binary each time for each file.

for file in `ls -1 *.pdb`
do
    ./filename $file
done
Community
  • 1
  • 1
InvisibleWolf
  • 917
  • 1
  • 9
  • 22
-1

XARGS should be able to solve your problem. follow below steps:

  1. Go to directory where .pdb files are present
  2. Run below command on terminal.

    find . -name "*.pdb" | xargs ./a.out
    

    If above command fails, it means output size of file names is bigger than xargs can accept.

you can use awk as below:

find . -name "*.pdb" | awk -F"/" '{system ("./a.out " $2)}'

Where ./a.out can be replaces with full path of your C executable.

It will process all pdb files one by one in single command.

cheers

HuntM
  • 157
  • 1
  • 7
  • `ls *.pdb` might fail from a certain number of files, size of the result of expansion. – alk Mar 21 '16 at 07:20
  • @alk I did n't get your point. – niyasc Mar 21 '16 at 07:22
  • 1
    Try it and create some 30000 files and then do an `ls *`. I remember an environment where not even 10000 file names were possible to name via shell expansion `*`. Also the size of the command line is limited. All those limits exist and depend on the shell in use and its configuration. – alk Mar 21 '16 at 07:31
  • @HuntM...I have used xargs but still it is taking only the first pdb file. I have to compile it first so i have done using "gcc program_name.c -o somename" and then to execute i have give the command u mentioned above. – – Laxmi Mar 21 '16 at 08:49
  • @Laxmi Can you try with modified command? If it doesn't work your file name exceeds max command line characters for Xargs and you have to write a script for it. – HuntM Mar 21 '16 at 09:23
  • @HuntM....Thank you HuntM....Thanks a ton.....you made my day.Great. It is working with "awk" – Laxmi Mar 22 '16 at 03:28
  • A small doubt.....What should i do to make it take files from beginning because it is taking files from the middle – Laxmi Mar 22 '16 at 03:31
  • Just run find command and make sure it is listing all required files in ./filename format. If any files are in recursive directory, you need to change command. Please upvote if it helped. – HuntM Mar 22 '16 at 03:55
  • It is listing files but not all – Laxmi Mar 22 '16 at 03:59