2

I need help to output a random text. I've been given a text with 25k words, from this text_file I've been calculating the probability for the single letters, and the single words to see which letters/words have been used most.

Now I need to make a other text with 500 letters, but this text should include the probability that I have calculated, and should be wroten by the letters that I "found" from the first text.

It's like: Text1 -> do probability over the usen letters, which letters have been represented most. Make text2 -> use the probability u found from text1.

Hope u can help me, Im new in Python.

py.codan
  • 89
  • 1
  • 11
  • It's not really clear what is expected output. Resulting text of 500 characters should have same letter frequencies? Or both word and letter frequencies? – Jarlax Jan 13 '15 at 22:35
  • Letter frequencies. Sorry for my haze. – py.codan Jan 13 '15 at 22:38
  • try using : https://pypi.python.org/pypi/fake-factory its pretty good. – James Sapam Jan 13 '15 at 23:06
  • Programming Pearls (by Jon Bentley) has a very good section about random text generation. You can read it here http://netlib.bell-labs.com/cm/cs/pearls/sec153.html – Hesham Attia Jan 14 '15 at 01:05

3 Answers3

4

The easiest thing is to randomly select letters of the 25k file. Then the resultant has the same probability as the original.

import random
print(''.join(random.choice(original_text) for _ in range(500)))
Jakube
  • 3,353
  • 3
  • 23
  • 40
0

You could do something like this:

import string
import random

def get_random_letter():
    # depends how you want to randomize getting your letter
    return random.choice(string.letters)

random_letters = []
for i in range(500):
    random_letter = get_random_letter()
    random_letters.append(random_letter)

with open("text.txt", 'w') as f:
    f.write("".join(random_letters))

You would change the "get_random_letter" definition depending on your probability model and return that character (in that case, you do not need to import random or string, these are just used for example).

Edit: To get the letter based on a certain weight you could use this:

import random

inputs = ['e', 'f', 'g', 'h']
weights = [10, 30, 50, 10]

def get_random_letter(inputs, weights):
    r = random.uniform(0, sum(weights))
    current_cutoff = 0
    for index in range(len(weights)):
        current_cutoff = current_cutoff + weights[index]
        if r < current_cutoff:
            return inputs[index]

print get_random_letter(inputs, weights)

which is derived from the post here: Returning a value at random based on a probability weights

Community
  • 1
  • 1
Alan Liang
  • 358
  • 2
  • 15
  • This doesn't answer the question at all. Basically py.codan asks for an implementation of the `get_random_letter()` method. – Jakube Jan 13 '15 at 22:42
  • So if I want to use my letters I should change random_letter to my letter_freqs? Or am I wrong? Thanks for the that fast answer. – py.codan Jan 13 '15 at 22:43
  • There may be a problem with this approach. It will give same probabilities of letter occurrence but frequencies will be different. Consider input with 200 occurrences of 'a' and 800 occurrences of 'b'. If the requirement means frequencies instead of probabilities (it's not 100% clear from the question what is expected) - result should be randomly shuffled array with exactly 100 'a' and 400 'b'. In your example it will have in average 100 'a' and 400 'b'. – Jarlax Jan 13 '15 at 22:48
  • It should be from probabilities, that's my frequencies_letter =[] – py.codan Jan 13 '15 at 23:17
0

I've now this:

def random_text():
   return(''.join(random.choice(text) for _ in range(500)))

random_letters = []

for i in range(1):
random_letter = random_text()
random_letters.append(random_letter)

print random_letters

Now it only runs once. But I don't know how to make the output text onto encoding utf-8?

py.codan
  • 89
  • 1
  • 11
  • random_text creates 1000 random letters, and your for loop creates 500 of those. Therefore it produces 1000*500 = 500.000 letters. Change the 1000 to 500 and only call `random_text` once. – Jakube Jan 13 '15 at 23:11
  • so it should be: def random_text(): return(''.join(random.choice(text) for _ in range(500))) random_letters = [] for i in range(1): random_letter = random_text() random_letters.append(random_letter) print random_letters That's how it works! Thanks mate!! – py.codan Jan 13 '15 at 23:23
  • Well, you don't need a loop if you execute the code just once. – Jakube Jan 13 '15 at 23:29