-1

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6"; 
user3781528
  • 623
  • 6
  • 27

5 Answers5

1

Since you need the exact position and know string lenghts substr can find it

perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename

This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.

The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • True, if _column 71_ means: 71st character in the line. – PerlDuck Jun 07 '16 at 19:16
  • @zdim. my $cmd6 = `perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' $currenttsvfile > $MDLtsvfile`; print "$cmd6"; gave me errors when I ran it from my Perl script. – user3781528 Jun 07 '16 at 19:42
  • @user3781528 Sorry that I didn't respond to your message -- I just didn't see it until now. (I think it's because there is a period after username?) You did get an explanation and your good answer so all is well :) – zdim Jun 08 '16 at 00:16
1
perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • True, if _column 71_ means: 71st tab-separated field. – PerlDuck Jun 07 '16 at 19:16
  • @PerlDog: We're told it's a tab-separated file, and column number rarely means character position. Besides, character position is very vague if there are tabs involved, especially when we don't know the size of the tab stops – Borodin Jun 07 '16 at 19:26
  • Given that, your answer is right. But zdim's answer was also upvoted so someone must have thought it's the 71st character. – PerlDuck Jun 07 '16 at 19:35
  • I get "Global symbol "@F" requires explicit package name" error – user3781528 Jun 07 '16 at 20:04
  • 1
    @user3781528: That's a self-contained Perl program. You should enter it on the command line, not as part of a script file – Borodin Jun 07 '16 at 20:10
  • @user3781528 That's because you are calling borodin's and zdim's Perl one-liners from within your Perl program. Don't do that. Please see my answer. – PerlDuck Jun 07 '16 at 20:28
1

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider

12345\t6789

Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.

If you want to integrate that into your Perl script you could do it like this:

Replace this line:

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6"; 

with this snippet:

open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {

    # character-based:
    print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});

    # tab/field-based:
    my @fields = split(/\s+/, $line);
    print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);

Use either the character-based line or the tab/field-based lines. Not both!

Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

PerlDuck
  • 5,610
  • 3
  • 20
  • 39
  • I'm ashamed because I only did a mash-up of two very good answers. – PerlDuck Jun 07 '16 at 20:52
  • 1
    @PerlDog This is a very good post and a perfectly fine answer in my opinion -- you pulled together what is needed and in fact answered the question in a suitable way. All good I say :) I find it interesting that we never were told which way it is. So even after all is said and done, and past and gone, we still don't know for sure. – zdim Jun 09 '16 at 18:30
0

Try it!

awk '{ if ($71 != "./." && $71 != ".0.") print ;  }' old_file.txt  > new_file.txt
0

The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.

Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.

If you do want a single shell command,

grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file

would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.

tripleee
  • 175,061
  • 34
  • 275
  • 318