UUID cassandra sorting?

Question

Lets say i have an user column family with unique keyname + preset for specific client

<?php 
uniqid ("serverA");//generate something like; serverA4b3403665fea6
?>

I can select them by secondary indexes etc like: (birthday example from phpcassa)

$column_family = new ColumnFamily($conn, 'Indexed1');
$index_exp = CassandraUtil::create_index_expression('birthdate', 1984);
$index_clause = CassandraUtil::create_index_clause(array($index_exp));
$rows = $column_family->get_indexed_slices($index_clause);
// returns an Iterator over:
//    array('winston smith' => array('birthdate' => 1984))

foreach($rows as $key => $columns) {
    // Do stuff with $key and $columns
    Print_r($columns)
}

However i only want to a query having the 30 latest added users (created keys) per page and multi page layout, every page showing older keys

The only option i currently found is using the uuid from phpcassa

uuid1() generates a UUID based on the current time and the MAC address of the machine.

Pros: Useful if you want to be able to sort your UUIDs by creation time.

Cons: Potential privacy leakage since it reveals which computer it was generated on and at what time.

Collisions possible: If two UUIDs are generated at the exact same time (within 100 ns) on the same machine. (Or a few other unlikely marginal cases.)

uuid2() doesn't seem to be used anymore.

uuid3() generates a UUID by taking an MD5 hash of an arbitrary name that you choose within some namespace (e.g. URL, domain name, etc).

Pros: Provides a nice way of assigning blocks of UUIDs to different namespaces. Easy to reproduce the UUID from the name.

Cons: If you have a unique name already, why do you need a UUID?

Collisions possible: If you reuse a name within a namespace, or if there is a hash collision.

uuid4() generates a completely random UUID.

Pros: No privacy concerns. Don't have to generate unique names.

Cons: No structure to UUIDs.

Collisions possible: If you use a bad random number generator, reuse a random seed, or are very, very unlucky.

uuid5() is the same as uuid3(), except using a SHA-1 hash instead of MD5. Officially preferred over uuid3().

But that means i have to rewrite some parts + get collision possibility.

Are there smart hacks i didn't think of?

score 3 · Answer 1 · edited Jun 23 '13 at 23:13

First, regarding UUIDs, you don't need to worry about collisions if you're planning on using either uuid1() or uuid4() (these are the only ones that really get used anyways). The probability of such an event is astronomically low. Don't worry about it.

For getting the 30 most recently added keys (along with paging capabilities), you're really talking about time series data. Here's a good intro to timeseries with Cassandra. You could either use timestamps or v1 UUIDs as the column names, and the unique keys as the column values. If you choose to use v1 UUIDs for the unique keys, you could just put those directly in the column names. At that point you're just dealing with normal time series data and paging in Cassandra.

UUID cassandra sorting?

1 Answers1