1

My student wrote this simple program that calls system to cat some files, but for some reason it escapes the * metacharacter when it shouldn't.

#!/opt/anaconda3/bin/perl

# catfastq.pl

# the following 'use' just makes the script print warnings and controles variable scoping (i.e. I have to place 'my' before a new variable declaration)
use warnings;
use strict;

# define array of letters that you will use in the file names
my @letter = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Z');

# store full path to where fastq files are in 'readdir' variable
my $readdir = '/home/data/madzays/finch_data/firstrun';

# store name of output directory in 'outdir' variable
my $outdir = '/home/data/madzays/finch_data/firstrun/combined';

# make output directory if it doesn't exist already
if (!-d $outdir)
{
        if (!mkdir($outdir))
        {
                print STDERR "Couldn't create output directory\n";
                exit 1;
        }
}

# loop through each element of the 'letter' array (note that the element will be stored in the 'lib' variable)
foreach my $lib (@letter)
{
        # the system function executes a command just like if you were to type it in the bash terminal

        # concatenate R1 files
        system("cat ${readdir}/RSFV1${lib}*R1_001.fastq > ${outdir}/RSFV1${lib}_R1.fastq");

        # concatenate R2 files
        system("cat ${readdir}/RSFV1${lib}*R2_001.fastq > ${outdir}/RSFV1${lib}_R2.fastq");
}

exit;

This is the output:

cat:/home/data/madzays/finch_data/firstrun/RSFV1S*R2_001.fastq: No such file or directory 
cat:/home/data/madzays/finch_data/firstrun/RSFV1T*R1_001.fastq: No such file or directory 
cat:/home/data/madzays/finch_data/firstrun/RSFV1T*R2_001.fastq: No such file or directory 
cat:/home/data/madzays/finch_data/firstrun/RSFV1U*R1_001.fastq: No such file or directory 
cat:/home/data/madzays/finch_data/firstrun/RSFV1U*R2_001.fastq: No such file or directory

Files are there (for example)

RSFV1S_S37_L005_R1.fastq  RSFV1S_S37_L005_R2.fastq  RSFV1S_S37_L006_R1.fastq  RSFV1S_S37_L006_R2.fastq  RSFV1S_S37_L007_R1.fastq  RSFV1S_S37_L007_R2.fastq

enter image description here Any ideas what could be wrong?

  • 2
    Re "*for some reason it escapes the * metacharacter when it shouldn't.*", It doesn't. `sh` doesn't expand `*` when there are no matches. – ikegami Mar 15 '18 at 03:13
  • Exactly, have a look at this link: https://stackoverflow.com/questions/19458104/how-do-i-pass-a-wildcard-parameter-to-a-bash-file it is your shell that will do the magic of interpreting wildcards and not the `cat` command for the cat command it is completely transparent, it just receives a list of arguments from its parent process, the `shell`!!! In your case you are directly calling the cat command from the perl process! Also check: http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm – Allan Mar 15 '18 at 03:16
  • 2
    @Allan is mistaken. You are involving the shell. It simply doesn't expand `*` if there are no matches. This can be seen using `sh -c 'echo nonexistent*'` – ikegami Mar 15 '18 at 03:24
  • @ikegami: really? So do you mean there is an intermediate step with a shell generation before the command is called and after it returns? I would like to know more about it! – Allan Mar 15 '18 at 03:26
  • 2
    @Allan, No, I'm saying `system(EXPR)` is short for `system('/bin/sh', '-c', EXPR)` (There is an optimization that avoids the shell if EXPR contains no shell metacharacters other than whitespace, but that's not the case here.) – ikegami Mar 15 '18 at 03:28
  • @ikegami: Nice!!! I learned something today with this: `system("cat ${readdir}/RSFV1${lib}*R1_001.fastq > ${outdir}/RSFV1${lib}_R1.fastq") is the same as system('/bin/sh', '-c', "cat ${readdir}/RSFV1${lib}*R1_001.fastq > ${outdir}/RSFV1${lib}_R1.fastq")` deleting my answer! :-) Thank you a lot for this I had no idea ;-) -> https://stackoverflow.com/questions/872230/how-does-system-exactly-work-in-linux – Allan Mar 15 '18 at 03:28
  • @ikegami there are matches tho :/ – Madza Farias-Virgens Mar 15 '18 at 03:31
  • 1
    Your shell disagrees, and I trust it more than you. You might be looking in the wrong directory, or there might be characters you don't see in their name. Or maybe they are there, but a permission issue prevents them from being seen, though I think that would give a different error message. – ikegami Mar 15 '18 at 03:31
  • @ikegami right on. prt sc added. – Madza Farias-Virgens Mar 15 '18 at 03:35
  • What's the output of `echo /home/data/madzays/finch_data/firstrun/RSFV1S*R2_001.fastq` – ikegami Mar 15 '18 at 03:36
  • /home/data/madzays/finch_data/firstrun/RSFV1S*R2_001.fastq – Madza Farias-Virgens Mar 15 '18 at 03:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/166862/discussion-between-madza-yasodara-farias-virgens-and-ikegami). – Madza Farias-Virgens Mar 15 '18 at 03:47

1 Answers1

5

Perl is not escaping the *. The issue is that sh doesn't find any matches, and it only expands * when it finds a match.

$ echo fi*le
fi*le

$ touch file

$ echo fi*le
file

The problem is that you are using

system("cat ${readdir}/RSFV1${lib}*R1_001.fastq > ${outdir}/RSFV1${lib}_R1.fastq");

system("cat ${readdir}/RSFV1${lib}*R2_001.fastq > ${outdir}/RSFV1${lib}_R2.fastq");

when you should be using

system("cat ${readdir}/RSFV1${lib}*001_R1.fastq > ${outdir}/RSFV1${lib}_R1.fastq");

system("cat ${readdir}/RSFV1${lib}*001_R2.fastq > ${outdir}/RSFV1${lib}_R2.fastq");
ikegami
  • 367,544
  • 15
  • 269
  • 518