0

Being a long-time Matlab user, I am accustomed to getting a caution whenever I build a list/array/anything with multiple elements in a loop such that it changes size every time, because that slows things down.

As I teach myself Python 2.7, I'm wondering if such a rule applies here to strings. I know exactly how long I want my string to be, and I have a specific list of the characters I want to build it from, but otherwise I want it to be random. My favorite code I've written so far is:

def BernSeq(length,freq):
"""Create a Bernoulli sequence - a random bitstring - of the given length 
and with the given frequency of 1s"""
    seq = '0'*length
    for ii in range(length):
        num = np.random.rand(1)
        if num < freq:
            cha = '1'
            seq = seq[:ii] + cha + seq[ii+1:]

I call this as BernSeq(20,.25) and I get the output '10001000000001011101'.

I already tried seq[ii] = '1', but, to put it in IPython's words, TypeError: 'str' object does not support item assignment.

So, am I doing this the most Pythonic way, or is there some sleight of hand that I haven't seen yet - maybe a random string or list generator to which I can directly give a list of characters I want it to randomly choose between, the probability I want each possibility to have, and how long I want this string or list to be?

(I've seen other questions about random strings, but while they do address how to make it the right length, they are generally trying to generate passwords, with all ASCII characters being equally likely. That's not what I want.)

Post169
  • 668
  • 8
  • 26
  • why are you wrapping `1` and `0` with `'` ?. why not just use a list? – Srini Mar 23 '18 at 19:48
  • Also, hopefully you know this already but Python 2 is dying soon(ish)! – miradulo Mar 23 '18 at 19:51
  • @SrinivasSuresh I'm pretty sure I want to work with an iterable, which `int` is not. – Post169 Mar 23 '18 at 20:01
  • 1
    @miradulo I'm pretty sure Python 2 won't die before a better version for mathematicians, scientists and engineers has come out – Post169 Mar 23 '18 at 20:03
  • @Post169 But lists are iterable. – glglgl Mar 23 '18 at 20:07
  • @Post169 yes, but a list of ints is iterable. What I'm suggesting is you store the ints ina list as opposed toa string. Essentially translating `'1' * length` to `[1] * length` – Srini Mar 23 '18 at 20:08
  • 1
    @Post169 Can you please substantiate that claim for me? Python 2 support is dropping in 2020. Libraries like NumPy are planning to drop Python 2 support in 2019. Unless you are a scientist or engineer that doesn't use up-to-date scientific libraries you might be in some trouble. – miradulo Mar 23 '18 at 20:28
  • Okay, I think I see the problem. I was trying to assign a scalar value to multiple positions in the list, like `list1[3:6] = 1`, and while that works in Matlab, it doesn't in Python. – Post169 Mar 23 '18 at 20:39
  • @Post169 Yes, Python is not Matlab. – miradulo Mar 23 '18 at 20:59

4 Answers4

3

There are several ways of doing it.

First, you can create the correct elements from the start:

seq = "".join("1" if np.random.rand(1) < freq else "0" for _ in range(length))

But the very first question to ask is: what do you want as an output? Do you require it to be a string? Maybe you are ok with a list of booleans?

Then

seq = [np.random.rand(1) < freq for _ in range(length)]

would be enough as well.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • I need an iterable, and it looks like booleans are not iterable. – Post169 Mar 23 '18 at 20:07
  • 1
    But lists (of booleans) are. – glglgl Mar 23 '18 at 20:08
  • Ah, thanks, you're right! (I was pessimistic after finding lists of ints not to be iterable.) – Post169 Mar 23 '18 at 20:11
  • 2
    @Post169 There you are wrong. All lists are iterable, regardless of their content. They iterate their elements in the order they are in the list. `for i in ["", 0, False, object(), [4], (42, 42)]: print(i)` prints an empty string, a 0, False, an `object()` object, a list with 4 as its element and a tuple with 42, 42 as its elements. Some of these are in turn iterable, others are not, but as these are not iterated over again, that's fine. – glglgl Mar 23 '18 at 20:19
  • In addition to what @glglgl said, any object that exports the `__iter__` method is iterable. A list is the container and is iterable. – Srini Mar 23 '18 at 20:21
  • Now I found another problem - I'm using these as dictionary keys, and it looks like [that mostly eliminates lists](https://stackoverflow.com/questions/7257588/why-cant-i-use-a-list-as-a-dict-key-in-python) of all kinds. (Yes, it says something about using memory location, but just putting a list itself in gives `TypeError: unhashable type: 'list'`) @SrinivasSuresh – Post169 Mar 23 '18 at 20:32
  • 2
    Then join the list into a string with the `"".join` method. Lists cannot be dictionary keys. This is because they are by nature mutable, are usually passed around by reference and it's not straightforward to implement a hashcode function for a list. More on that [here](https://wiki.python.org/moin/DictionaryKeys). – Srini Mar 23 '18 at 20:38
  • I found the `"".join` method puzzling until I looked it up and found a great explanation of it [here](https://stackoverflow.com/questions/1876191/what-exactly-does-the-join-method-do) – Post169 Mar 23 '18 at 21:00
  • @glglgl I tried the `"".join` method, and it looks like exactly the sort of thing recommended on the videos of PyCons that I've seen! – Post169 Mar 23 '18 at 21:04
0

You could define a function that produces a random letter, as per your distributional preferences, then perform it the number of times you like.

import random
def my_letter():
   a = random.randint(1,10)
   if a > 5:
   return "a"
   else:
   return "b"

Then for your length preferences:

my_str = ""
for x in range(length):
   my_str += my_letter()
William
  • 83
  • 8
  • That's two steps back from where I have it - I want it to take as few steps as possible and not change the size of the array. – Post169 Mar 23 '18 at 20:15
0

Strings in python are immutable

In [1]: a = '1' * 5

In [2]: a
Out[2]: '11111'

In [3]: type(a)
Out[3]: str

In [4]: a[2] = 'c'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-69ed835eb212> in <module>()
----> 1 a[2] = 'c'

TypeError: 'str' object does not support item assignment

In [5]: b = [1] * 5

In [6]: b
Out[6]: [1, 1, 1, 1, 1]

Adjusting your code to use a int list instead. (minimal fixes, not fixing style or optimizing other things)

def BernSeq(length,freq):
"""Create a Bernoulli sequence - a random bitstring - of the given length 
and with the given frequency of 1s"""
    seq = [0] * length
    for ii in range(length):
        num = np.random.rand(1)
        if num < freq:
            cha = 1
            seq.append(cha)
Srini
  • 1,619
  • 1
  • 19
  • 34
  • The way I'm using this needs iterables, and it looks like lists of `int`s are not iterable – Post169 Mar 23 '18 at 20:13
  • 2
    Lists of ints, list of chars list of anything is iterable. Because they are lists. https://stackoverflow.com/questions/13054057/confused-with-python-lists-are-they-or-are-they-not-iterators – Srini Mar 23 '18 at 20:16
0

There is a Binomial distribution in NumPy, np.random.binomial. Sampling from it with your desired frequency and then joining together string representations will be faster than reinventing it yourself.

def bernouilli_str(N, one_freq):
    return ''.join(np.random.binomial(1, one_freq, N).astype('U1'))

Benchmark

In [112]: %timeit ''.join(np.random.binomial(1, 0.75, 10**6).astype('U1'))
637 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [113]: %timeit "".join("1" if np.random.rand(1) < 0.75 else "0" for _ in range(10**6))
1.69 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
miradulo
  • 28,857
  • 6
  • 80
  • 93