7

Once I've asked about : how does apparently random query string/urls are being generated.

It can be found in many places :

http://www.youtube.com/watch?v=IMl7pvaWzh8
                                   ^
                                   |
                                   +---------------- 

http://jsfiddle.net/xeolabs/LSTKM/light/
                              ^
                              |
                              +---------------- 

http://jsbin.com/asapay/1/edit
                   ^
                   |
                   +---------------- 

I was told ( and it seems logic) that when the server pick a new url , it doesnt check if it is free ( wasn't chosen before)

Also , I was told it can be result of a formula such as: f(n+1) = f(n)+1 ( so it is not random at all.

so the new url param is generated as a result of the last generated url param.

my Question :

where can I found such generators functions ?

of course I can build one of my own such 17=16+1 but i'm looking something ready like :

f(n+1) = f(n)+1 where there is a usages of uppercase , lower case , numbers. and
of course minimum collisions and zero predictions.

you know , something professional...

I'm just curious how google/youtube/jsbin/jsfiddle does it with a millions of requests a day.

Community
  • 1
  • 1
Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • 2
    You cannot create a hashing function that has zero collisions - it simply cannot be done, unless your hash has the same amount of data as the actual data. You can hash it, but you'll probably need some central store to disambiguate / etc. – Marc Gravell Dec 23 '12 at 20:15
  • @MarcGravell reserve _what_? I can save sequential numbers - yes. but i need that the input of `IMl7pvaWzh8` will yield me `SGierk43`.even if i choose RGNCcryptoGenerator - I cant be sure that taking only thr first 7 bytes ( and convert to letters) will be unique. ( also who said that the char will be able to be at the query string ? - as you know , not all chars can be) – Royi Namir Dec 23 '12 at 20:17
  • 2
    I'm saying: try to hash `IMl7pvaWzh8` using any arbitrary method; but then say you get `SGierk43` - you might need to check whether `SGierk43` is in use, and if so, re-hash (with some other factor) to get a new value (rinse, repeat, etc); you need to store the slugs centrally **anyway**, because you can't reverse them. You need a lookup that `SGierk43` (or whatever) gets you back to the original `IMl7pvaWzh8` – Marc Gravell Dec 23 '12 at 20:20
  • @MarcGravell are you telling me that every time a new video in google is uploaded and has a new query string - Google scans its db to see if it's free ? ( and in my prev comment I meant RNGCryptoServiceProvider) – Royi Namir Dec 23 '12 at 20:20
  • @MarcGravell can you please answer my last comment? – Royi Namir Dec 23 '12 at 20:27
  • @RoyiNamir there are tricks to speed up this operation (ie partitioning and ranging) – Adrian Dec 23 '12 at 20:28
  • @Adrian of course there are always solutions. this is what this site is for. get answers. can you provide link/information / code ? – Royi Namir Dec 23 '12 at 20:30
  • I'll write an answer for you – Adrian Dec 23 '12 at 20:33
  • I write and here the link to a simple solution, what I suggest is just a unique incrimental number that you then compress it to base-64 : http://stackoverflow.com/questions/8528720/is-there-a-simple-way-to-generate-a-un-duplicate-string/8528822#8528822 I write it as a comment because there is no reason to copy/paste the same answer. – Aristos Dec 23 '12 at 20:42
  • @Aristos Im already running it :-) – Royi Namir Dec 23 '12 at 20:47
  • @RoyiNamir I running it some months now, working super. If you also scramble the alphabet then you have and a basic encryption. – Aristos Dec 23 '12 at 20:48
  • @Aristos do you mean: change the `char []` elements order ? ( only once) – Royi Namir Dec 23 '12 at 20:50
  • @RoyiNamir Yes this is what I mean. The encode is use a set of character and the order of them is part of the encoding. In the function `string Encode(long inp, IEnumerable map)` the map is a `char []` array. If you scramble it you see. Be sure on this array each character appears ones. – Aristos Dec 23 '12 at 20:52
  • @Aristos what is this _"for a 48-bit base, omits l/L, 1, i/I, o/O, 0"_ – Royi Namir Dec 23 '12 at 21:03
  • @RoyiNamir Its say you that what ever characters you place there you change the base. You can place also numbers, capital or low case or all together, symbols etc... The more the characters (or the less) you change the compression level - the base. This characters have type is missing, and you can add them also. – Aristos Dec 23 '12 at 21:12
  • @Aristos this is what amaze me with stackoverflow. Im asking this question twice, and I still - won't ever know how it is done in real life ( mega sites). it is amazing.( and sad) – Royi Namir Dec 23 '12 at 21:19
  • @RoyiNamir - This is a great site to find out more about high scalability websites http://highscalability.com/all-time-favorites/ – keyboardP Dec 23 '12 at 21:23
  • @Aristos it is s asite to read from . not asking question to.....:-( – Royi Namir Dec 23 '12 at 21:25
  • @RoyiNamir This is the way they do that. The way I describe it, the mathematics and the math idea is the same. Some small details did not change the idea. – Aristos Dec 23 '12 at 21:25
  • @Aristos please supply your AVG(comments) to an answer so i can check it. – Royi Namir Dec 23 '12 at 21:31
  • @Royi I don't k ow how they do it, specifically. If I had to guess, I'd speculate that different data-centres issue different ranges. – Marc Gravell Dec 23 '12 at 21:40
  • @Aristos is there any special reason for base 48 ? I could add more chars and use a larger base....no ? – Royi Namir Dec 24 '12 at 20:01

5 Answers5

2

Thinking out loud, but you could just precalculate a huge list of unique hashes and assign them to any new inputs. Precalculating will ensure you can keep checking for collisions as it's not required in real time. You can look into generating random hashes in this question.

Community
  • 1
  • 1
keyboardP
  • 68,824
  • 13
  • 156
  • 205
  • nice. so there will be a nightly SQL job which produce new "scrambles". and in real time I wont need to do it. but nightly in USA is morning in Israel.and They both need to see that a key hasn't been chosen.... – Royi Namir Dec 23 '12 at 20:39
  • It would only be nightly if you're reserves are being taken up that quickly. As a fallback, you could create a random hash in real time if there are no more left. URL shortening algorithms might be something you could look into http://www.codinghorror.com/blog/2007/08/url-shortening-hashes-in-practice.html – keyboardP Dec 23 '12 at 20:41
  • So according to your answer , google provided much more free urls than then incoming one so the stack is never empty . there are always a pre-readyForLongTime-new entries....right ? this way I dont care who is adding. Israel or Usa since both are adding to the top of the stack which has many more items ( although every milisecond a new entry is being taken) – Royi Namir Dec 23 '12 at 20:45
  • I'm not sure what Google actually does, but it was just an idea that could be possible. A new entry every millisecond is 86400000 a day. I'm not sure how well that would work but you would need a very well structured backend setup. – keyboardP Dec 23 '12 at 20:49
0

This could not exactly be answer to your specific question but if you need a function that returns unique and unpredictable string then there is one:

Guid.NewGuid().ToString()

I often use it to form unique querystrings in various scenarios.

Wiktor Zychla
  • 47,367
  • 6
  • 74
  • 106
  • IMHO it is being used with RGNCcryptoServiceProvide ( which i mentioned already). Also - it is not in a form of `f(n+1)=f(n)+1`. ( and I don't know if it's matters or not) – Royi Namir Dec 23 '12 at 20:36
  • That's why I am not sure if it fits your requirements. Does it really have to be of the iterative form or you just need random querystrings? – Wiktor Zychla Dec 23 '12 at 20:38
  • I'm just curious how google/youtube/jsbin/jsfiddle does it with a millions of requests a day. – Royi Namir Dec 23 '12 at 20:40
  • They probably use a simple one way hash with any collision resolver (take new hash, rehash the hash, move to a next free slot). – Wiktor Zychla Dec 23 '12 at 20:44
0

System.IO has a random filename generator, perhaps you could hijack that.

string randomString = System.IO.Path.GetFileNameWithoutExtension(System.IO.Path.GetRandomFileName());

returns somthing like "jdvpmpre"

you could join a couple together to make it more unique, but this would be a quick simple solution.

sa_ddam213
  • 42,848
  • 7
  • 101
  • 110
0

One solution (that I have used myself) could be this:

Requirement: a unique source for an increasing sequential number (like a sequence in Oracle or a auto-incremental index in SQL Server or the like) - anything that you can handle reliably for producing such an incremental source.

Workflow for generating every new URL (or whatever you need it for): 1 - Get the next value of your sequence. 2 - Convert it to a base 36 number (you can google for it's implementations in C# like this one). 3 - Use the generated base 36 number in your URL (or what ever you are doing like modifying database, etc).

Note on base 36 number: We use decimal system in our daily operations which consists of 10 digits. We use hexa decimal numbers in computers which is produced from 16 digits (0-9 plus A,B,C,D,E and F). Now there is a base 36 system too which is produced by using 36 digits; 0-9 and A-Z and all digits are alphanumeric. So can be used easily in URLs. An example from Wikipedia page: 2,821,109,907,456 decimal would be CRE66I9S in base 36.

Kaveh Shahbazian
  • 13,088
  • 13
  • 80
  • 139
0

Continuing my comment,
Assuming you have several locations which are taking inputs and generating the unique tokens I said you could partition the ranges. For example say you have one site in Israel and one in the USA, and you want both to generate unique tokens (you don't want any overlaps between the tokens generated at these sites), you could use a unique database to store the current token value.

(1) This is the scenario. The db starts with a token with value 1.
(2) Israel site asks the db to get some new tokens, the db will give it range from 1-1000 (not tokens, but the range). This way the Israel site doesn't have to go back to the db for each new request it gets until it uses up all those 1000 tokens.
(3) The USA site goes to the db and gets 1001-2000 range for tokens.
(4) In our example you have 2 consumers and 1 producer (the db). The assumption is that you want to use your db as little as possible to not block the other consumer(s). So if each producer takes 1 second to go to db, then how many ID's should the db give to each consumer. The answer is number of IDs consumer uses/1 second * number of consumers. This way the consumers are not deadlocked waiting on each other for the db to become free.

So how do those producers make use of the range? They could generate base 72 tokens for the range they received from the db by incrementing a counter. Why base 72? Because that gives a short token for a large number. To come up with 72 I used a-z,A-Z,0-9, special characters on the 0-9 keys: 25+25+10+10. You can go higher than 72.

An implementation of session tokens is found at:
https://github.com/hoytech/Session-Token

There is also this question which might be helpful:
How to generate a random alpha-numeric string?

Community
  • 1
  • 1
Adrian
  • 5,603
  • 8
  • 53
  • 85