0

I've come across an error today and would like other people's opinions on a solution beyond what I have. The error is in a dataset. The data in the last column/field of the first and second row/record should be the same, and the second to last column/field of row/record 1 is always "1". The problem is when this is not so and and the steps needed to correct it.

The incorrect data is as such, in a file called "sample.txt":

5@Comedia   @5@3@2@3@1/2  @3@1.6  @1@2 1/2@11@14 1/4
3@Melanistic@3@4@2@4@1 1/2@4@2 3/4@3@5    @2 @4 3/4
2@Pure      @4@5@5@5@3 1/2@5@4 3/4@5@8    @3 @6 1/2
4@Profit    @2@2@1@2@1.6  @1@1.6  @2@2 1/2@4 @6 1/2
1@Whammy    @1@1@1@1@1.6  @2@1.6  @4@5 1/2@5 @8 1/4

The correct data should look like this:

 5@Comedia   @5@3@2@3@1/2  @3@1.6  @1@2 1/2 @1@4 3/4
 3@Melanistic@3@4@2@4@1 1/2@4@2 3/4@3@5     @2@4 3/4
 2@Pure      @4@5@5@5@3 1/2@5@4 3/4@5@8     @3@6 1/2
 4@Profit    @2@2@1@2@1.6  @1@1.6  @2@2 1/2 @4@6 1/2
 1@Whammy    @1@1@1@1@1.6  @2@1.6  @4@5 1/2 @5@8 1/4

My current solution is a multi-step process I have a feeling can be streamlined. Any suggestions are highly appreciated.

1)Create a bash variable:

 length=$(cat sample.txt |awk -F@ 'NR==2{print $NF}') 

2)Create a file with the correct information in row 1:

awk -F@ -v l="$length" 'NR==1{$(NF-1)=1;$NF=l;print $0}' OFS=@ sample.txt >sample1.txt

3)Append the remaining info to the created correct row file

awk -F@ 'NR>1{print $0}' sample.txt >>sample1.txt   

Is there an awk, sed, or Perl one liner (or combinations of pipes) that can accomplish the three steps above in one?

Thomas Paine
  • 291
  • 1
  • 14
  • As the delim is `@` does that mean that the spaces are required? – 123 Feb 01 '16 at 08:12
  • Also your commands would not give your expected output – 123 Feb 01 '16 at 08:15
  • Yes, the spaces are required. This is because of math equations that are executed on the database elsewhere by other code. Without the spaces it will be a pain to convert the fractions to decimals before the use of multiplication. – Thomas Paine Feb 01 '16 at 20:28

3 Answers3

1

If I have understood you correctly then this program will do as you wish

It reads the first two lines from the file, and replaces the last two fields of the first line with 1, and the last field from the second line. Then it prints those two lines and copies the rest of the file

The path to the input file is expected as a parameter on the command line

use strict;
use warnings 'all';

my $line1 = <>;
my $line2 = <>;
my ($val) = $line2 =~ /.+\@(.+)/;

$line1 =~ s/\@[^\@]*\@[^\@]*$/\@1 \@$val\n/;

print $line1;
print $line2;

print while <>;

output

5@Comedia   @5@3@2@3@1/2  @3@1.6  @1@2 1/2@1 @4 3/4
3@Melanistic@3@4@2@4@1 1/2@4@2 3/4@3@5    @2 @4 3/4
2@Pure      @4@5@5@5@3 1/2@5@4 3/4@5@8    @3 @6 1/2
4@Profit    @2@2@1@2@1.6  @1@1.6  @2@2 1/2@4 @6 1/2
1@Whammy    @1@1@1@1@1.6  @2@1.6  @4@5 1/2@5 @8 1/4
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • What does having $val in brackets do ? Also you might want to mention that you are using perl... – 123 Feb 01 '16 at 08:54
  • The parentheses are necessary for scoping the `my` declaration. – tripleee Feb 01 '16 at 09:57
  • @tripleee I don't understand what you mean, how does it affect the scope ? – 123 Feb 01 '16 at 10:16
  • 1
    @123 http://stackoverflow.com/questions/10031455/using-my-with-parentheses-and-only-one-variable – tripleee Feb 01 '16 at 11:46
  • 1
    @tripleee thanks that clears it up, i think we have different definitions of scope though! – 123 Feb 01 '16 at 11:55
  • 2
    @123: You're right, it has nothing to do with scope. It serves to put the expression on the right-hand side in *list context* so that the capture is copied into `$val` instead of just a true/false to say whether it matched – Borodin Feb 01 '16 at 15:09
0

If I have understood you correctly then this awk one liner will do as you wish!!

awk -F@ -v OFS="@" 'NR==1{$12=$12-10; $13=$13-10 " 3/4";}{$11=$11" "; sub(" ", "", $12);}1'

Output:

5@Comedia   @5@3@2@3@1/2  @3@1.6  @1@2 1/2 @1@4 3/4
3@Melanistic@3@4@2@4@1 1/2@4@2 3/4@3@5     @2@4 3/4
2@Pure      @4@5@5@5@3 1/2@5@4 3/4@5@8     @3@6 1/2
4@Profit    @2@2@1@2@1.6  @1@1.6  @2@2 1/2 @4@6 1/2
1@Whammy    @1@1@1@1@1.6  @2@1.6  @4@5 1/2 @5@8 1/4
Firefly
  • 449
  • 5
  • 20
-1

You can combine all these three commands in one line. Like following:

LENGTH=$(cat sample.txt |awk -F@ 'NR==2{print $NF}') awk -F@ -v l="$length" 'NR==1{$(NF-1)=1;$NF=l;print $0}' OFS=@ sample.txt ; awk -F@ 'NR>1{print $0}' sample.txt
Murad Tagirov
  • 776
  • 6
  • 10
  • 1
    Your right...One can combine the variable declaration, and editing the first row/record in one command. That answers what I thought was possible, yet could find no examples of, that is, how to declare an awk variable and use the variable in one command. Also, I was not aware that one awk's output could be another awk's input by using a semicolon...Much thanks for this example, as it will be used in more solutions then this. A true AWK one liner of one liners – Thomas Paine Feb 01 '16 at 20:15