3

I am trying to delete few columns and then to unique of the file contents. Columns which I want to delete are like month,day,time and epoch time;these are different in each line and cannot let me to unique of the file contents.

Sample contents of sample.log :

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:14 AAA AAA AAAA 1433495594.306612 XXXX CCCC CCCC AAAA SDDDD DFFFFF222
Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:16 AAA AAA AAAA XXXXX 1433495597.306615 XXXX CCCC CCCC AAAA SDDDD DFFFFF333
Jun  5 05:13:17 AAA AAA AAAA XXXXX 1433495598.306616 XXXX CCCC CCCC AAAA SDDDD DFFFFF444

Issue:

Month, date,time are in fixed column , however epoch time is toggling between column number 7 and 8. Want to know how to deal with this.

Sample output:

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

If above is too much to ask then like below:

AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

I am trying things in following direction but not very helpful.

while read line
    do

seven=$(echo $line |awk '{print $7}')
eight=$(echo $line |awk '{print $8}')

if [[ "$seven" =~ "^[0-9]" ]];then
    #echo "seventh column starts with number"
    echo $line|awk '$1=$2=$3=$7=" " {print}'
else
    #echo "Eighth column starts with number"
     echo $line|awk '$1=$2=$3=$8=" " {print}'
fi
    done < $1

More example:

Input file contents:

Jun  5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF

Output:

Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF

OR

Output:

 AAA BBB CCC  DDD EEE FFFF
 AAA BBB CCC  DDD EEE GGGG
 AAA BBB CCC XXX  DDD EEE GGGG
 AAA BBB CCC XXX  DDD EEE FFFF
P....
  • 17,421
  • 2
  • 32
  • 52
  • 1
    Note that instead of saying `while read line ...; seven=$(echo $line | awk '{print $7}'` you can always do `while read field1 field2 ... field7 field8`. – fedorqui May 23 '16 at 11:33
  • thanks for info, mine was really ugly ! – P.... May 23 '16 at 11:35

3 Answers3

2

A very basic approach is to check the format of the field: if it consists in digits + . + digits, that's the one!

awk '{$1=$2=$3=""
      if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""}
      else {$8=""}
     } 1' file

Note this leaves some extra spaces all around because when you empty a field, the interleaving FS remain there. For a clean removal of columns, check Ed Morton's answer to Print all but the first three columns.


To make sure every 1st, 2nd, 3rd and last block of columns do not repeat, use the awk '!uniq[$0]++' file approach:

awk '!uniq[$1 $2 $3 $(NF-4) $(NF-2) $(NF-1) $NF]++{$1=$2=$3=""
      if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""}
      else {$8=""}
     } 1' file
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • @RandomUser why would you need that? Explain what you want to do, because most probably `awk` is going to be able to do it. – fedorqui May 23 '16 at 11:37
  • I want to find the unique logs (only one time occurrence of each logs) ,but day,date,epoch is not letting me do that. – P.... May 23 '16 at 11:39
  • 1
    @RandomUser ok, this should be easy with awk. Provide a [mcve] of this so I can check. That is, sample input (you already have it) and desired output. – fedorqui May 23 '16 at 11:41
  • @RandomUser ok! Added approach to make sure no repeated rows appear. – fedorqui May 23 '16 at 12:37
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/112690/discussion-between-randomuser-and-fedorqui). – P.... May 23 '16 at 13:13
2

If I'm understanding the question correctly there is no need for Bash here, just Awk:

% awk '
{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf $f " "
            }
        } else {
            printf $f " "
        }
    }
    printf "\n"
}
' sample.log          
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF222 
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF333 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF444 

To grab the unique rows:

% awk '             
{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf $f " "
            }
        } else {
            printf $f " "
        }
    }
    printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD EEE FFFF 
AAA BBB CCC DDD EEE GGGG 
AAA BBB CCC XXX DDD EEE FFFF 
AAA BBB CCC XXX DDD EEE GGGG 

On handling %s...

If your input file contains % signs, per your comment, you'll need to escape these before passing them into printf. You could do that with a function like this...

% awk '             
function escape_percents(s) 
{ 
    gsub("%", "%%", s) 
    return s
}

{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf escape_percents($f) " "
            }
        } else {
            printf escape_percents($f) " "
        }
    }
    printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD %E%E%E FFFF 
AAA BBB CCC DDD %E%E%E GGGG 
AAA BBB CCC XXX DDD %E%E%E FFFF 
AAA BBB CCC XXX DDD %E%E%E GGGG 
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • awk: cmd. line:9: (FILENAME=samplelog FNR=51452) fatal: not enough arguments to satisfy format string `%ERROR: ' ^ ran out for this one – P.... May 23 '16 at 12:11
  • 1
    Note `printf "\n"` can be also said like `print ""`. – fedorqui May 23 '16 at 12:27
  • @RandomUser: Let me know how you fix that! :-) – johnsyweb May 23 '16 at 22:50
  • @fedorqui: Yeah, I considered that implementation but favoured the explicit `\n` for clarity. – johnsyweb May 23 '16 at 22:51
  • @Johnsyweb : I am not very familiar with awk, I used a dirty way of eliminating the problem area rather correcting problem . I used sed -i /s/%E/E/g' fileName before implementing your solution. :/ – P.... May 24 '16 at 14:09
  • 1
    @RandomUser That may be problematic because you'd be conflating `%E` with `E` when looking for unique values. See my updated solution for a different approach... – johnsyweb May 25 '16 at 11:44
  • Curious as to why this was downvoted. It seemed to help @PS.! – johnsyweb Oct 03 '16 at 22:17
  • indeed it helped,that wasn't me who DV. – P.... Oct 04 '16 at 04:40
0

If the columns after the epoch time remain constant, then the easiest way is to manipulate only NF.

Using input from More example:

awk '{NewLine=$4; 
for(i=(NF-5);i>=0;i--){
if(i!=3){
NewLine=NewLine" "$(NF-i)
}
}
print NewLine
}' Sample.log | sort | uniq

Using the input

Jun  5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF

you will get

AAA BBB CCC DDD EEE FFFF
AAA BBB CCC DDD EEE GGGG
AAA BBB CCC XXX DDD EEE FFFF
AAA BBB CCC XXX DDD EEE GGGG
FoldedChromatin
  • 217
  • 1
  • 4
  • 12