2

The other day I saw a method for querying for a random document from a collection using AQL on this very same website:

Randomly select a document in ArangoDB

My implementation of this at the moment is:

//brands
let b1 = (
for brand in brands
filter brand.brand == @brand1
return brand._id
)

//pick random car with brand 1
let c1 = (
for edge in edges 
filter edge._from == b1[0] 
for car in cars 
filter car._id == edge._to 
sort rand() limit 1 
return car._id
)

However, when I use that method it can hardly be called 'random'. For instance, in a 3500+ document collection I manage to get the same document 5 times in a row, and over the course of 25+ attempts there're maybe 3 to 4 documents that keep being returned to me. It seems the method is geared towards particular documents being output. I was wondering if there's still some improvement to be done here or another method that wasn't mentioned in that thread. The problem is that I can't comment on the thread yet due to low reputation levels, so I can't ask the question in the same place. However I think it merits a discussion nonetheless. I hope someone can help me out in getting a better randomization.

Omnia87
  • 103
  • 7
  • For what purpose do you want to fetch a random document? Do you want to display a random car on a website as "recommendation" / alternative to the current selection done by the user? Are there any other properties except a uniform distribution that you require? – CodeManX Mar 19 '18 at 11:26
  • Hi @CoDEmanX, thanks for your question! It's basically part of a guessing game. Let's say it's like a random model gets shown to the end user and this person has to guess the correct manufacturer. So randomness is quite important in order for the game not to become stale. In any case the code above is just an example really, the content is not that relevant at the moment. – Omnia87 Mar 19 '18 at 11:31
  • I see. For such a guessing game, wouldn't it be desired to not repeat cars within a single game for specific user, or maybe even across games for that user? And a technical question: Do you use RocksDB as storage engine? – CodeManX Mar 19 '18 at 12:24
  • That's definitely something to think about but thus far I was only concerned with having a viable way to have any randomness, especially as there will be several other random parameters later down the line. All of which show the same behavior but where I'm sure I want to have some repetition. I'm currently not using RocksDB no. – Omnia87 Mar 19 '18 at 12:34
  • The `rand()` should really take a seed. I opened an issue on github: https://github.com/arangodb/arangodb/issues/4906 – Andrew Grothe Mar 20 '18 at 16:21
  • Based on feedback from the github issue, this is a Windows only issue. Are you using Windows as well? – Andrew Grothe Mar 20 '18 at 23:35
  • That's correct @AndrewGrothe, I'm on a windows machine at the moment. – Omnia87 Mar 23 '18 at 12:58
  • Ok, you can track the bug via the link I posted earlier. They are going to increase the seed used on Windows. updated my answer to reflect that as well. – Andrew Grothe Mar 23 '18 at 13:08

1 Answers1

2

Essentially the rand() function is being seeded the same on each query execution. Multiple calls within the same query will be different, but the next execution will start back from the same number.

I ran this query and saw the same 3 numbers each time:

return {
    "1": rand(),
    "2": rand(),
    "3": rand()
}

Not always, but more often than not got the same numbers:

[
  {
    "1": 0.5635853144932401,
    "2": 0.19330423902096622,
    "3": 0.8087405011139256
  }
]

Then, seeded with current milliseconds:

return {
    "1": rand() + DATE_MILLISECOND(DATE_NOW()),
    "2": rand() + DATE_MILLISECOND(DATE_NOW()),
    "3": rand() + DATE_MILLISECOND(DATE_NOW())
}

Now I always get a different number.

[
  {
    "1": 617.8103840407173,
    "2": 617.0999366056549,
    "3": 617.6308832757169
  }
]

You can use various techniques to produce pseudorandom numbers that won't repeat like calling rand() with the same seed.

Edit: this is actually a Windows bug. If you can use linux you should be fine.

Andrew Grothe
  • 2,562
  • 1
  • 32
  • 48