I need to get a stratified sample of my huge table. Specifically, I want to select 1/n rows from my table without bias, i.e. select randomly, select every nth row, etc.
Before I asked this question, I tried doing this. However, it didn't work for me because I am using the InfiniDB engine and, as I found out later, it doesn't support variables in sub-expressions, or something like that. Does anyone know a way to do this without user variables?
I was thinking about something like this: in my table, every row has a unique alphanumeric string id, which can look like "1234567890"
, or like "abcdef12345"
. I was thinking of somehow converting that string to a number, and then using the modulo function to only select 1/n rows from my table. However, I have no idea how to do the conversion, as this string is not hexadecimal.
Note: my table does not have an autoincremented column.