2

I want to divide a file in a two random halfs with python. I have a small script, but it did not divide exactly into 2. Any suggestions?

import random
fin = open("test.txt", 'rb')
f1out = open("test1.txt", 'wb')
f2out = open("test2.txt", 'wb')
for line in fin:
    r = random.random()
    if r < 0.5:
        f1out.write(line)
    else:
        f2out.write(line)
fin.close()
f1out.close()
f2out.close()
November
  • 63
  • 6
  • Read all the lines into a list; shuffle the list; then write half the list to each file. – ekhumoro Mar 10 '18 at 19:09
  • unless the two random halves need to maintain order, which would require a little extra work after shuffling – avigil Mar 10 '18 at 19:25

2 Answers2

3

The notion of randomness means that you will not be able to deterministically rely on the number to produce an equal amount of results below 0.5 and above 0.5.

You could use a counter and check if it is even or odd after shuffling all the lines in a list:

file_lines = [line for line in fin]

random.shuffle(file_lines)

counter = 0
for line in file_lines:
    counter += 1
    if counter % 2 == 0:
        f1out.write(line)
    else:
        f2out.write(line)

You can use this pattern with any number (10 in this example):

counter = 0
for line in file_lines:
    counter += 1
    if counter % 10 == 0:
        f1out.write(line)
    elif counter % 10 == 1:
        f2out.write(line)
    elif counter % 10 == 2:
        f3out.write(line)
    elif counter % 10 == 3:
        f4out.write(line)
    elif counter % 10 == 4:
        f5out.write(line)
    elif counter % 10 == 5:
        f6out.write(line)
    elif counter % 10 == 6:
        f7out.write(line)
    elif counter % 10 == 7:
        f8out.write(line)
    elif counter % 10 == 8:
        f9out.write(line)
    else:
        f10out.write(line)
JacobIRR
  • 8,545
  • 8
  • 39
  • 68
0

random will not give you exactly half each time. If you flip a coin 10 times, you dont necessarily get 5 heads and 5 tails.

One approach would be to use the partitioning method described in Python: Slicing a list into n nearly-equal-length partitions, but shuffling the result beforehand.

import random

N_FILES = 2
out = [open("test{}.txt".format(i), 'wb') for i in range(min(N_FILES, n))]
fin = open("test.txt", 'rb')

lines = fin.readlines()
random.shuffle(lines)
n = len(lines)

size = n / float(N_FILES)
partitions = [ lines[int(round(size * i)): int(round(size * (i + 1)))] for i in xrange(n) ]

for f, lines in zip(out, partitions):
    for line in lines:
        f.write(line)

fin.close()
for f in out:
    f.close()

The code above will split the input file into N_FILES (defined as a constant at the top) of approximately equal size, but never splitting beyond one line per file. Handling things this way would let you put this into a function that can take a variable number of files to split into without having to alter code for each case.

avigil
  • 2,218
  • 11
  • 18