1

For testing data, I am in need of quickly creating large files of random text. I have one solution, taken from here and given below:

import random
import string

n = 1024 ** 2  # 1 Mb of text
chars = ''.join([random.choice(string.letters) for i in range(n)])

with open('textfile.txt', 'w+') as f:
    f.write(chars)

My problem is that this takes 653 ms to perform, way too much for my uses.

Is there a faster way to quickly generate text files with random text?

cs95
  • 379,657
  • 97
  • 704
  • 746
Jonas Adler
  • 10,365
  • 5
  • 46
  • 73

1 Answers1

1

Create a numpy array of letters:

In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]: 
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
      dtype='<U1')

Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:

np.random.choice(letters, n)

Timings:

In [664]: n = 1024 ** 2

In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop

Alternatively,

In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I am able to modify this somewhat and get an order of magnitude better performance: `np.random.choice(np.fromstring(string.letters, dtype='S1'), n)`, total time `17 ms`. Could you update the answer to that and I'll accept that answer? – Jonas Adler Jul 15 '17 at 21:10
  • 1
    @JonasAdler That gives you a list of chars, right? You'll want to join them together. – cs95 Jul 15 '17 at 21:13
  • It seems `f.write` accepts char arrays. The result looks alright and writing is basically instant. – Jonas Adler Jul 15 '17 at 21:13
  • @JonasAdler I got you a bit faster, if you don't mind the fact they're not binary strings. – cs95 Jul 15 '17 at 21:16
  • @JonasAdler Glad to help :) – cs95 Jul 15 '17 at 21:18