3

I am preparing a test data which must have different letters say அ-20 times ம-30 times , த-40 times .....( They are UTF-8 coding supported Tamil Languague letters ) this could be achieved using a print statement

{print ( ' ம் ' * 30 ) + ( ' த ' * 40  ) + }

But , I need to scramble them so that they dont appear in any particular order . I have around 230+ letters which I would be printing for 20,30,40 times . And then I need to scramble them and write them to an output file . Any help in this regard be helpful .

sibi kanagaraj
  • 101
  • 1
  • 10

4 Answers4

3

There are quite a few ways you could go about this. The most efficient will be using the random module.

random.shuffle

>>> from random import shuffle
>>> my_string = list('This is a test string.')
>>> shuffle(my_string)
>>> scrambled = ''.join(my_string)
>>> print(scrambled)
.sTtha te s rtisns gii

For this, you must create a list from the characters of the string, because strings are immutable.

A new object has to be created if a different value has to be stored.

random.sample

>>> from random import sample
>>> my_string = 'This is a test string.'
>>> scrambled = random.sample(my_string, len(my_string))
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
gr.s i tisstheit Tn sa

You don't have to create a list for this; because, from the random.sample documentation:

Returns a new list containing elements from the population while leaving the original population unchanged.

The sorted built-in with random.random

>>> from random import random
>>> my_string = 'This is a test string.'
>>> scrambled = sorted(my_string, key=lambda i: random())
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
ngi rts ithsT.staie s 

You don't need a list for this either. From the sorted documentation:

Return a new sorted list from the items in iterable.

Because a string is treated as an iterable (see below) in Python, sorted can be used on it.

An iterable is defined as

An object capable of returning its members one at a time.

Community
  • 1
  • 1
Zach Gates
  • 4,045
  • 1
  • 27
  • 51
  • I tried random , shuffle . While its a good option for English alphabets , it plays a different way for Tamil .Eg. assuming that கா is present in the original text , its represented as 2 character , so shuffle will get confused with that and start printing junk if it shuffles . Tats the problem here – sibi kanagaraj Apr 16 '15 at 07:10
  • I would recommend the third suggestion then. The sorted builtin. @sibikanagaraj – Zach Gates Apr 16 '15 at 07:12
  • The 3rd suggestion is the one which I am using `# coding=utf-8 from random import random my_string = 'அ ஆ ந் ' scrambled = sorted(my_string, key=lambda i: random()) scrambled = ''.join(scrambled) print(scrambled)` But the output is again junk characters – sibi kanagaraj Apr 16 '15 at 07:26
  • Can you give an example of these so-called junk characters? @sibikanagaraj – Zach Gates Apr 16 '15 at 19:06
  • First I run it without # -*- coding: utf-8 -*- . And it gives the error : SyntaxError: Non-ASCII character '\xe0' in file stack.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Next , I add the unicode 8 Now the output is ������ ��� – sibi kanagaraj Apr 18 '15 at 05:59
2

I suggest splitting this problem into 3 parts: assemble your list of letters, shuffle the list, then write it to a file. Note that the first line in the following code should be at the top of your python file to allow you to use utf-8 characters in the source code itself.

# -*- coding: utf-8 -*-

import codecs  # To write UTF-8 characters to a file
import random

# Assemble data list    
letters = [u'அ', u'ம', u'த']
data = []  # This list will hold the shuffled data

for current_letter in letters:
    # Choose how many times to repeat the current letter.
    times_repeated = random.choice([20, 30, 40])
    data.extend([current_letter] * times_repeated)

# Now, shuffle the 'data' list
random.shuffle(data)

# Now write the shuffled list to a file as one continuous string
data_string = "".join(data)

with codecs.open("data.txt", "w", "utf-8") as f:
    f.write(data_string)

Note that if you know how many times you want each letter to show up, you could put that information into a dictionary instead of choosing randomly from [20, 30, 40]:

# The key is the letter to repeat, the value is the number of times to repeat it
letters = {u'அ': 20,
           u'ம': 30,
           u'த': 20}

for letter in letters:
    times_repeated = letters[letter]
    # ... rest of the code would look the same ...
Zachary Cross
  • 2,298
  • 1
  • 15
  • 22
  • In your `for` loop, it would be better to use `letters.get(letter, 0)`. – Zach Gates Apr 15 '15 at 19:39
  • @zachgates7 While using this code , I get the error : UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128) – sibi kanagaraj Apr 16 '15 at 07:30
  • @sibikanagaraj I've updated the answer to support utf-8 characters in the source code itself, which should fix your issue. Specifically, I added an encoding line at the top and I changed the declaration of the `letters` list. – Zachary Cross Apr 16 '15 at 08:37
0

Credits to my friend @AswinMurugesh who helped me with the code .

The following code did the trick .

import codecs
import tamil
from random import shuffle

inp_file = codecs.open("/home/sibi/Desktop/scramble.txt",encoding="utf-8")
inp_text = inp_file.read().rstrip()

tamil_letters = tamil.utf8.get_letters(inp_text)
shuffle(tamil_letters)
tamil_letters = "".join(tamil_letters).encode("utf-8")
print tamil_letters

out_file = open('outputscrambled.txt','w')
out_file.write(tamil_letters)
sibi kanagaraj
  • 101
  • 1
  • 10
-1

Just use random.choice:

import random
size = 1000
values = [' ம் ', ' த ', ' த ']
print "".join(random.choice(values) for i in xrange(size))
Alexandru Chirila
  • 2,274
  • 5
  • 29
  • 40