2

My question is, given a list of integers, is it possible in bash to a) find all the sequences of consecutive numbers, then b) remove all but the last numbers in those sequences?

For example, given this list and assuming that the numbers are stored, one per line, in a .txt file,

001
002
003
005
007
010
011
012

is there a program/set of programs that would produce output

003
005
007
012

and if so, how? Thank you for your time.

EDIT:

Here's what I have so far:

#!/bin/bash

cat file.txt | numinterval >> interval.txt

integer=''
while read -u 3 interval
do
    if [[ "$interval" -ne "1" ]]
    then echo "$integer" >> desequenced.txt
    else read -u 4 integer
    fi
done 3< interval.txt 4< file.txt

The central idea is to run the sorted list of integers through numinterval, then to check if the numinterval list has any ones. If it does, move on to the next integer. If not, print the corresponding integer to a file.


10508
10861
10862
10906
10906
10909
10909
10950
10950
11179
11181
11182
11325
11325
11341
11341
11428
11428



































Here is the output. Obviously something has gone wrong, as not only are consecutives not removed, there is a huge amount of whitespace after the list has ended.

Any help is appreciated.

  • Yes, such a program can be written. Is that your only question? – ggorlen May 24 '19 at 18:21
  • 1
    Error noted and corrected. – TheRavenKing May 24 '19 at 18:30
  • 1
    Thanks for clarifying. However, this is still [off-topic](https://stackoverflow.com/help/on-topic). On-topic questions provide a [mcve] showing your attempt at solving the problem. Otherwise, it sounds like a "give me the code" request. – ggorlen May 24 '19 at 18:44

3 Answers3

3

One way, using awk:

$ awk 'NR > 1 && $0+0 != prev+1 { print prev }
       { prev = $0 }
       END { print prev }' test.txt
003
005
007
012
Shawn
  • 47,241
  • 3
  • 26
  • 60
0

I wrote this ugly thing. You just need to figure out how to create your arr and how to pretty print the result.

arr=( 1 2 3 5 7 10 11 12 )
result=()
k=0
for (( i=0; i<${#arr[@]} - 1 ; i++ )); do
        curArg=${arr[$i]}
        nextArg=${arr[$i+1]}
        if ((curArg != nextArg - 1 )); then
                result+=($curArg)
        fi
done
result+=(${arr[-1]})

colos_enough
  • 164
  • 1
  • 1
  • 9
  • 1
    Array subscripts are already an arithmetic context so you don't need `$(())` inside `[]`. You especially don't need it on the last line since you're not performing any arithmetic. Also, you can do `(( last = len - 1 ))` which I find cleaner looking. Your `if` would be better as an integer comparison: `if ((curArg != nextArg - 1 ))`. You seem to be setting up `result` as an array, but you're appending to it as a scalar. Use `result+=($curArg)` instead. Also your first `echo` appears to be intended to output a confirmation of the array's value. You should always quote variables... – Dennis Williamson May 24 '19 at 20:52
  • ...for output and you should rarely use `*` for an array subscript unless you understand what it does and actually need that behavior. Instead use `echo "${arr[@]}"`. Even better for testing/confirmation purposes: `declare -p arr` which shows the structure of the array. You could collapse your assignments of `last` and `len` into only assigning `last`: `(( last = ${#arr[@]} - 1 ))` or omit it's use entirely using `for (( i=0; i<${#arr[@]} - 1; i++ )); do` (fixed a missing semicolon) and `result+=(${arr[-1]})` – Dennis Williamson May 24 '19 at 20:56
  • Otherwise, good solution. By the way, Bash doesn't support arrays of arrays. – Dennis Williamson May 24 '19 at 21:02
0

Try this Shellcheck-clean pure Bash code:

#! /bin/bash -p

prev=
while read -r curr || [[ -n $curr ]] ; do
    [[ -n $prev ]] && (( 10#$curr != (10#$prev+1) )) && printf '%s\n' "$prev"
    prev=$curr
done <file.txt
[[ -n $prev ]] && printf '%s\n' "$prev"
  • || [[ -n $curr ]] is to enable the code to work even if the last line in the input file is not terminated. See Read a file line by line assigning the value to a variable.
  • The 10# prefix in 10#$curr and 10#$prev forces the variable contents to be treated as decimal numbers. Otherwise 010 would be treated as decimal 8 instead of decimal 10.
  • No checks are done to ensure that the input lines contain (only) decimal numbers. A real program should do such checks.
  • Since no checks are done on the validity of the input, the code uses 'printf' instead of 'echo' to reduce the possibilities for confusion if the input is bad. See Why is printf better than echo?.
pjh
  • 6,388
  • 2
  • 16
  • 17