1

I have a 321MB txt file containing different books, I want to permute the file by selecting sections of 400 chars, permute the order of those sections and finally write them back.

The following block of code is my attempt at this, it results in an error.

import numpy as np

with open ('/home/gabriel/Desktop/GOD/Data/all_no_mix.txt','r') as fr:
    chunks = []
    char_len = 400
    data = fr.read()
    for i in range(0,len(data),400):
        chunks.append(data[i:char_len])
    fr.close()

with open ('/home/gabriel/Desktop/GOD/Data/all_mix.txt','a') as fw:
    num_chunks = len(chunks)
    order = np.random.permutation(num_chunks)
    for i in order:
        fw.write(chunks[i])
    fw.close()

It only writes the first 400 chars of the file all_no_mix.txt to all_mix.txt.

What I am missing?

GGS
  • 153
  • 2
  • 11

1 Answers1

1

Within the reading loop you wrote:

    chunks.append(data[i:char_len])

You want:

    chunks.append(data[i:i+char_len])

Additionally, the final range argument should be symbolic:

    for i in range(0, len(data), char_len):

Also the docs ask you to avoid the deprecated calling sequence you used, and to call permutation in this way:

        rng = np.random.default_rng()
        order = rng.permutation(num_chunks)
J_H
  • 17,926
  • 4
  • 24
  • 44