0

i have a file that looks like this, but a lot bigger :

xxxxxxx                                              xxxxx                 xxxx
yyyyyyy                                              yyyyy                 yyyy
zzzzzzz                                              zzzzz                 zzzz
bbbbbbb                                                                    bbbb

but a lot bigger, i want to compare the second and the third column, only the first 4 digits do i wrote something like this :

while IFS='' read -r line; do

a=${line:181:4}
b=${line:276:4}
str1="$a"
str2="$b"

# echo $a
# echo $b
# echo $str1
# echo $str2

if [[ "$str1" == "$str2" ]]
then

      echo $line >> $1.diffs.txt
fi

done <$1

the reason for the str* is because the numbers occasionally start with 0 so it interprets them as octal and gives me an error, my problem is that the output on the $1.diffs.txt contains only 1 white space, instead of all of them. IE

Output :

xxxxxxx                                              xxxxx              xxxx
yyyyyyy                                              yyyyy                 yyyy
zzzzzzz                                              zzzzz                 zzzz
onlyf
  • 767
  • 3
  • 19
  • 39
  • 1
    In your `if` statement, you need to quote `$line` i.e. `echo "$line" >> ...` to preserve the whitespace – arco444 Nov 27 '15 at 11:46

1 Answers1

1

You could use awk in place of your existing shell script:

awk '{ if (substr($2,0,4) == substr($3,0,4)) { print $0 }}' $1 > $1.diffs.txt

awk separates on whitespaces by default, so you can do your comparison with the variables $2 and $3 instead of having to manually define the substring positions

arco444
  • 22,002
  • 12
  • 63
  • 67
  • my problem is that column 2 is not always full, sometimes its just blanks, i wonder if awk will "hit" an error on that! – onlyf Nov 27 '15 at 11:42
  • Yes it would. You should update your question to make it clearer, this situation was not mentioned. – arco444 Nov 27 '15 at 11:43