1

I have a list of ranges, and I am trying to merge subsequent entries which lie within a given distance of each other.

In my data, the first column contains the lower bound of the range and the second column contains the upper bound.
The logic follows: if the value in column 1 is less than or equal to the value in column 2 of the previous row plus a given value, print the entry in column 1 of the previous row and the entry in column 2 of the given row.

If the two ranges lie within the distance specified by the variable 'dist', they should be merged, else the rows should be printed as they are.

Input:    
1   10  
9   19  
51  60

if dist=10, Desired output:    
1   19  
51  60  

Using bash, I've tried things along these lines:

dist=10  
awk '$1 -le (p + ${dist}) { print q, $2 } {p=$2;} {q=$1} ' input.txt > output.txt

This returns syntax errors.

Any help appreciated!

Sundeep
  • 23,246
  • 2
  • 28
  • 103
Marla
  • 340
  • 3
  • 16
  • quick comment: you cannot use `bash` syntax in `awk` they are completely different things... use `<`, `<=`, etc for comparison – Sundeep Oct 02 '17 at 11:54
  • 1
    see https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script for passing shell variables to awk – Sundeep Oct 02 '17 at 11:55
  • 2
    awk is not shell. it's a completely different tool with it's own syntax, semantics, and scope. Get Effective Awk Programming, 4th Edition, by Arnold Robbins to start learning how to write awk programs. – Ed Morton Oct 02 '17 at 12:17
  • What if you have two consecutive matches? i.e., the second line should be merged with the first, but then the third should be merged with the second, or even the first. This won't compress the line in the middle out. Is that ok? – Paul Hodges Oct 02 '17 at 13:23
  • Yes, @PaulHodges that would be perfect. – Marla Oct 02 '17 at 13:33
  • 2
    What would be perfect? [edit] your question to include the case where a range spans multiple rows in your sample input/output. – Ed Morton Oct 02 '17 at 13:38

1 Answers1

1

Assuming, if the condition is satisfied for 2 pairs of consecutive records (i.e 3 records in total, consecutively) then 3rd one would consider the output of rec-1 and rec-2 as it's previous record.

awk -v dist=10 'FNR==1{prev_1=$1; prev_2=$2; next} ($1<=prev_2+dist){print prev_1,$2; prev_2=$2;next} {prev_1=$1; prev_2=$2}1' file

Input :

$cat file
1 10
9 19
10 30
51 60

Output:

1 19
1 30
51 60
Rahul Verma
  • 2,946
  • 14
  • 27