2

I wrote a little function that will create a random string of a certain length:

def append_until_length(acceptable, length=45):
    retval = set()
    for _ in range(1000):
        retval.add(random.choice(acceptable))
        if len(retval) == length:
            return ''.join(retval)

This works and everything, so it's all fine and dandy. But while running it I've noticed a sort of pattern if you will:

>>> for _ in range(10):
...     append_until_length(acceptable)
... 
'!#"%\'(+*-,/.057698=?ADGIHLRUV[]\\`behjmonpryx~'
'"$\')+*,025498:=?ACBGKONQPSY[]\\acdgfhkmruvy{z|'
'#"\'&)+,/03248=<?>ABFHJLOPWYXZ]cbdfhklonqrutz}'
' #"(*-/0328EIJMPSRUWVYX]_^acbegfkmlqpstwvx{}|'
'!#"(,/.032549;=>EDHMLOSYX[]_^acbedjlonprtvxz~'
" %',10346?@CEDFIKNQRVYXZ]\\_abghkjlnqpruw{z}|~"
'! #+,/035469:<@CFIKLSRUVY[Z^cbfijloqsutwvxz}|'
'$&)(+-/5;:?>ABDFIHMLOPSUTYXZa`bdhkjmonprwvx}~'
'!#"&*-/102579:=>@DFKJMLONQSTVYX\\^acimoqpstw}~'
'! &(+-/.2548:=<?A@EGFIKOQPSRTVX\\eihjonprutx}~'
>>> 

If you look at this, the first few characters are always punctuation, the next few are always numbers, then comes the uppercase with some mixed punctuation, another punctuation, lowercase letters, and the last characters are always punctuation.

The acceptable characters I'm using are list(string.printable)[:-6] with a .append(" "). The length of this list is 95:

>>> acceptable
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~', ' ']
>>> len(acceptable)
95
>>> 

Now I understand that the set() will not allow multiple of the same characters to be in the string, however, that does not explain the pattern always being the same (not really the same but remotely the same). See if I do this via a list there is never a pattern to the function:

>>> def append_until_length(acceptable, length=45):
...     retval = []
...     for _ in range(length):
...         retval.append(random.choice(acceptable))
...     return ''.join(retval)
... 
>>> for _ in range(10):
...     append_until_length(acceptable)
... 
"] *rZI/<=LwPGU-PzWj)\\jp9tZ}e9T#}4/\\R`4Q^?4)'W"
'%z6wTvuzK;{eS}"^GRf(}a3<"Qqg_*2v?1`y@;=Bn#ycQ'
"t'bqj,*}7:w]:8c;Ddy. 17@^Y0{)>}'25tsl1kf+C%6^"
'RZt)s=?~QrAok+Z\\ei}5K^&1e+w0~*zl{hS2;l]|?p/T;'
'%InO5_fWcJU#v,6_=cPb^cfd1=\\;k{37~$214vd+F&oH&'
'!6Ey#"\'3.,ivG+7\'y[&1`aYNDg-\\j#:! -7(8b#$x)Q1m'
'w}/{mnT\\-IT2?;V_K ZDDy:YzaG+LgGkZWkV8E y@_)Y;'
'e1@71AFDF;|Q.<_fRG0tG*`557z(|}bHDCT+dc}{[QGq8'
"ie~;Iy1O)f!n,Z%%0\\36-!Lke1}cA'uptRS7(2ki|mzgi"
'G=v&#.J1@E$N?NK|~>( E4M/^y[~HK)#Hi$23ez~EY>N '

Even if I treat the list like a set there is still no pattern to the output strings:

def append_until_length(acceptable, length=45):
    retval = []
    for _ in range(10000):
        char = random.choice(acceptable)
        if char not in retval:
            retval.append(char)
        if len(retval) == length:
            return ''.join(retval)

8hKO W5"'ERJa/N$vb9^4!)fig:c_n&?@(#}oTC]qePwZ
,b2;Y^VD9|:O!>QilH`4(7/F?8f&5~_B$x#pN{Igahs\n
_z1eDiH$9k&rRt>M/FOqb8SLY.{|0dI4A^:l,3cs7ng][
Y/iu#eOlVMmZ 9S`t?1JX2$<)&|jUz'"~wLIvoqkr}!(H
r~/m{8SLvU?_aVX4A"0%zEgK1I!9#B|snphOZb,@jw\]2
;nX!T20.^b"\eqNExOlrQF'V&#(%iht{Hw+-Sy,Dj]:9[
B@%H[2f&JuwSd1bEnih#}]3jTMLzAW.ZG~,tX|!/N_`D(
usv}KkZgL]&<hY^6Blp\GENTrFC~Xw3#4S8QmRf"PUnM|
?G3Ao[z7gVLve-}S>X]&<+k(DZ*UcsM50r)^1Om`P4K,6
,#&(1-'sj9qy7~dZpuIk!%Q D8haSNrco{xe;=.T[WK0<

So my question would be, why does the pattern occur with a set? The uppercase and lowercase characters have a different ord number, therefore are different characters. IE:

>>> ord("c")
99
>>> ord("C")
67
>>> 

So in my head, it doesn't make sense to why there is a pattern in the strings, if they are randomly generated? According to help(set):

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |  
 |  Build an unordered collection of unique elements.
CertifcateJunky
  • 161
  • 1
  • 11

1 Answers1

1

Your issue is that sets aren't really unordered. They obey to an implementation defined order that you cannot rely on or predict (and which is different between 2 executions of the python interpreter), but it's there ('order' of unordered Python sets).

In that case, it seems to be the natural alphabetical order but not necessarily:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> set("ABCDEFG")
{'A', 'E', 'F', 'G', 'C', 'B', 'D'}
>>> set("ABCDEFG")
{'A', 'E', 'F', 'G', 'C', 'B', 'D'}

Same order for 2 different sets, not alphabetical. Now let's run it again:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AM    D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> set("ABCDEFG")
{'C', 'G', 'D', 'B', 'E', 'A', 'F'}
>>>

different order (related to hash seed & python security features)

So, within the same instance of the interpreter, it sorts the different character kinds the same, which creates a "pattern".

To benefit from the speed of set (O(1) average lookup vs O(n) for lists) and keep the chaotic order that random provided, you could use an auxiliary set to test, but store in a list:

def append_until_length(acceptable, length=45):
    retval = []
    testset = set()
    for _ in range(10000):
        char = random.choice(acceptable)
        if char not in testset:  # fast lookup (O(1))
            retval.append(char)  # add to the result list
            testset.add(char)    # add to the set
        if len(retval) == length:
            return ''.join(retval)
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • According to `help(set)` `class set(object) Build an unordered collection of unique elements.` – CertifcateJunky Apr 10 '18 at 15:04
  • unordered means: you cannot rely on the order. But there's one at least in your python version/implementation: https://stackoverflow.com/questions/12165200/order-of-unordered-python-sets – Jean-François Fabre Apr 10 '18 at 15:05
  • So if I did `set([1,2,3,4,5,6,7,8,9,10])` it should display in a greatest to least order? – CertifcateJunky Apr 10 '18 at 15:08
  • on my python 3.4 version I get the exact same sorted list. least tp greatest. Try it out. Easier for integers since hashing of integers are most of the time themselves (for small ones, except -1) – Jean-François Fabre Apr 10 '18 at 15:09
  • 1
    Yeah I did try it out. I also tried out `set([1,2,2,2,5,5,5,5,58,58,58])` which displays as `set([1, 2, 58, 5])`. `sets`s are freaking weird man lol. – CertifcateJunky Apr 10 '18 at 15:12