-2

This is my algorithms assignment, and I do not know how to proceed.

Given an array A of m strings, where different strings may have different numbers of characters, but the total number of characters over all the strings in the array is n. Show how to sort the strings in O(n) time. Note that the desired order here is the standard alphabetical order; for example, a < ab < b. More technically speaking, A is an array of pointers each pointing to a string (which is another array of characters); you can think about how strings are used in C. Also, we assume that each character can be viewed as an integer ranging from 0 to 255.

Suhaib Ahmad
  • 487
  • 6
  • 25
  • The fastest you can sort an array of strings of size `N` is roughly `O(N*lgN)`, which you get from doing merge sort or quick sort. – Tim Biegeleisen Oct 14 '15 at 07:11
  • Possible duplicate of [this](http://stackoverflow.com/questions/2352313/is-there-an-on-integer-sorting-algorithm). – npinti Oct 14 '15 at 07:11
  • As linked to by npinti, a least significant to most significant character counting / radix sort would take (maximum length of a string) passes, moving m strings on each pass. If the maximum length of a string is considered to be a constant, such as n or some other constant, then the time complexity could be considered to be O(m) (not O(n)). If m is considered to be the constant, and n is variable, then the time complexity could be considered to be O(n). – rcgldr Oct 14 '15 at 07:25
  • https://en.m.wikipedia.org/wiki/Trie – n. m. could be an AI Oct 14 '15 at 08:06
  • @TimBiegeleisen This isn't really just basic string sort. This is having n total characters, which are divided into m substrings. And then the m substrings need to be sorted in O(n), with m <= n, obviously. – hyde Oct 14 '15 at 08:15
  • 1
    To OP: you should perhaps show a bit more effort than writing "here's my assignment copy-pasted, help me!". Like, explain what you don't understand about it, or tell it in your own words (which amounts to rubber duck problem solving, so it's actually useful in itself!). – hyde Oct 14 '15 at 08:38
  • @TimBiegeleisen "The fastest you can sort an array of strings of size N" -- that's not relevant. `n` in the problem statement is a very different number from your N. – Jim Balter Oct 14 '15 at 13:50

2 Answers2

3

Since this is an assignment I won't provide a complete answer, merely some ideas on how to proceed.

Since the strings can be any length you need to use an O(n) sorting algorithm.

One such algorithm is bucket sort.

So how do we arrange the buckets for variable length strings?

Create 256 buckets for the first character. Let each bucket have a counter + a set of 256 buckets for the second character and so on.

Important note: Don't create any bucket set until you need to or the memory consumption will be infinite. Let an empty bucket set be NULL

When we have the bucket system set up. How do we sort a word into the bucket system?

Let's say we have the word Yes.

First character is Y so we go to the top level bucket set. The set is NULL so we create the top level and select bucket 'Y'.

Next character is e. The bucket set under Y is NULL so we create the set and select bucket 'e'.

Next character is s. The bucket set under Ye is NULL so we create the set and select bucket 's'.

The string ends. Increase the count for the current bucket Y->e->s.

Note that the task will be simpler if you use unsigned char, because then you can use the value directly as an index into an array of length 256.

The bucket struct could look like this:

typedef struct bucket {
    int count;
    struct bucket *next;  // points to NULL or array of 256 buckets.
} bucket;

Time Complexity:

The maximum amount of work for each character is:

end of string check + NULL check + ((allocation and initialization of array of 256 buckets (I would use calloc for this) or (increase one bucket count)) + increase loop variable.

Memory Usage

Here comes the disadvantage of bucket sort. It uses a lot of memory if you need many buckets, and this solution will use quite a number.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82
0

You may think about String as about number in 255 basis. So, if distribution is uniform, then bucket sort would give linear time; also radix sort is linear, but you need some preparations before sort (to transform String into number).

Ivan Ivanov
  • 2,076
  • 16
  • 33