3

I come here with a solution I just wrote but I definitely can't find a better way to sort my array as fast as possible.

I actually need the algorithm to give me the answer in less than 1 sec, on an array containing 100'000 integers.

Here's the code :

read N
for (( i=0; i<N; i++ )); do
    read arr[i]
done

sorted=($(printf '%s\n' "${arr[@]}"|sort -n))

min=$(( sorted[1]-sorted[0] ))

for (( i=1; i<N-1; i++ )); do
    diff=$(( sorted[i+1] - sorted[i] ))
    if [ "$diff" -lt "$min" ]; then
        min=$diff
    fi
done

echo $min

N is the number of elements, in our case 100'000 as I said.

The thing is, after my array is sorted, I need to loop through it and calculate the minimum distance between two integers :

For example, with (3, 5, 8, 9), the minimum distance is 1 (between 8 and 9).

I'm such a newbie in Bash, so I don't even know if this is an efficient way;

In fact, the gain of speed could come from the second part of the code, and not from the sorting part...

Any ideas ?

Thanks in advance,

Eldoïr
  • 109
  • 1
  • 11
  • Do you wish to optimize your current code or would you like different, faster code? – ShellFish Jun 22 '15 at 21:06
  • I don't need to keep this code, so another faster code could fit ! – Eldoïr Jun 22 '15 at 21:07
  • 2
    The second part finds the minimum distance in linear time. I don't think you can achieve better (asymptotic) performance – fferri Jun 22 '15 at 21:07
  • Then have a look at this: http://stackoverflow.com/questions/7442417/how-to-sort-an-array-in-bash. Also, as *mescalinum* noted, the second part is asymptotically optimal so you need not optimize there. – ShellFish Jun 22 '15 at 21:07
  • One thing you could do is return when a distance of `0` has been found (or `1` if no duplicates are allowed). You cannot find a smaller difference so further parsing is redundant. – ShellFish Jun 22 '15 at 21:09
  • Thanks for your responses, actually I already visited this link and this is where I picked up the short, one-line version you can see in the middle. Sometimes shorter is not necessarily faster... But I just can't get the qsort function to work, I use it like this : qsort "${arr[@]}" but this doesn't seem to sort my array... – Eldoïr Jun 22 '15 at 21:12
  • Perhaps try this qsort: http://www.bashguru.com/2011/01/quicksort-shell-script.html – ShellFish Jun 22 '15 at 21:19
  • And, the distance will never be 0 or 1 as the data I'm dealing with are large integers, I ran some tests and the distance never falls under 40 ~ – Eldoïr Jun 22 '15 at 21:21
  • Yeah, I already tried that too, I don't know how it works, how do I get the sorted array to loop through it later ? I tried sorted=($(sortnumbers "${arr[@]}")) but I doesn't sort the array either – Eldoïr Jun 22 '15 at 21:28
  • EDIT : My bad, it worked but not fast enough... ! – Eldoïr Jun 22 '15 at 21:32
  • 1
    A bucket sort would be faster O(N) if your Integers are known to be a fixed set. – Steve Jun 22 '15 at 21:32
  • 3
    Does it have to be bash? bash is about the worst possible language you could pick for this type of stuff, due to all the unnecessary string parsing. I'm willing to bet you'll get an easy 10x speedup by recoding in a compiled language. – j_random_hacker Jun 22 '15 at 21:34
  • @Steve : What do you mean by "fixed set" ? (Sorry, english is not my native language) – Eldoïr Jun 22 '15 at 21:39
  • @j_random_hacker I totally agree with you, but this problem is part of a challenge which is, sadly, to write the algorithm in bash :) I had a working solution in C in a few minutes but that was the easy part - so the challenge is about optimization ! – Eldoïr Jun 22 '15 at 21:39
  • @user3623965: OK, makes sense now :) – j_random_hacker Jun 22 '15 at 21:44
  • @user3623965 Let's say your integers are only single digits. So the fixed set would be `{0,1,2,3,4,5,6,7,8,9}` You would implement the bucket sort by having a counter for each distinct integer and then rebuild the sorted array from that. Linear time for both the counting and array recreation. – Steve Jun 22 '15 at 22:11
  • Oh ok, actually you asked me if there was a range of values that never changed in my array ? Because that is the case, my values are all positive and < 1,000,000. – Eldoïr Jun 22 '15 at 22:48
  • Should this question be moved to http://codereview.stackexchange.com/ ? – Brent Washburne Jun 22 '15 at 22:50
  • 2
    @BrentWashburne This question *could* be a good question on Code Review, but first ask the question: Is it *off-topic* for Stack Overflow? That's the most important requirement for migration. Code Review is not a migration target for Stack Overflow so only a *custom flag* could get the question moved there, if you feel that it should be moved. – Simon Forsberg Jun 22 '15 at 22:54
  • I'm not sure if it should be moved, that's why I'm asking all of you. – Brent Washburne Jun 22 '15 at 22:57

3 Answers3

2

numbers range <0,1000000) is small enough

  • as your numbers are not repetitive (smalest distance is>=40)
  • then you just need a bit per value (if it is present or not)
  • so you need up to 1MB for Byte/value or up to 128KB for bit/value
  • (as the K,M are based on 1024 so you have large enough reserve)

so Bucket sort is a possibility:

  1. create array of flags for each possible value

    • bool flag[1000000];
  2. clear the flags O(M)

    • for (int i=0;i<1000000;i++) flag[i]=0;
  3. compute flags by processing your array arr[N] ... O(N)

    • for (int i=0;i<N;i++) flag[arr[i]]=1;
    • for repetition just increment flag[] unless it is too big to avoid overflow
    • in case the flag[arr[i]] is already 1 then return distance = 0
  4. reconstruct the array O(M)

    • for (int i=0,j=0;i<1000000;i++) if (flag[i]) { arr[j]=i; j++; } N=j;
  5. now compute the distance O(N)

    • you can mix these steps together so you do not need the arr[N] in memory
    • instead just count the consequent zeroes in flag[] ...

[Notes]

  • M=1000000
  • N<=M
  • if N is much much smaller then M then this may not be the fastest way ...
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • This seems to be a great solution and you explained it very well, thanks to you ! I will check the bash syntax for this and post my results asap :) – Eldoïr Jun 23 '15 at 09:54
2

I finally came with a simple and pretty elegant solution :

read N

for (( i=0; i<N; i++ )); do
    read tab[i]
done

printf "%i\n" ${tab[@]} | sort -n | awk 'BEGIN{dmin=1000000;} (NR>1){d=$1-prec; if (d<dmin) dmin=d;} {prec=$1;} END{print dmin;}'

And that's it. :) Thanks to all of you for taking the time to help me ! ;)

Eldoïr
  • 109
  • 1
  • 11
1

Given this awesome page on sorting algorithm complexity I would use the Radix Sort (here in Python, I didn't find a Bash implementation but I'm still looking):

#python2.6 <
from math import log

def getDigit(num, base, digit_num):
    # pulls the selected digit
    return (num // base ** digit_num) % base  

def makeBlanks(size):
    # create a list of empty lists to hold the split by digit
    return [ [] for i in range(size) ]  

def split(a_list, base, digit_num):
    buckets = makeBlanks(base)
    for num in a_list:
        # append the number to the list selected by the digit
        buckets[getDigit(num, base, digit_num)].append(num)  
    return buckets

# concatenate the lists back in order for the next step
def merge(a_list):
    new_list = []
    for sublist in a_list:
       new_list.extend(sublist)
    return new_list

def maxAbs(a_list):
    # largest abs value element of a list
    return max(abs(num) for num in a_list)

def split_by_sign(a_list):
    # splits values by sign - negative values go to the first bucket,
    # non-negative ones into the second
    buckets = [[], []]
    for num in a_list:
        if num < 0:
            buckets[0].append(num)
        else:
            buckets[1].append(num)
    return buckets

def radixSort(a_list, base):
    # there are as many passes as there are digits in the longest number
    passes = int(round(log(maxAbs(a_list), base)) + 1) 
    new_list = list(a_list)
    for digit_num in range(passes):
        new_list = merge(split(new_list, base, digit_num))
    return merge(split_by_sign(new_list))
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142