I have the following data set (this is a sample):
ID Sub1 Sub2 Sub3 Sub4
Creb3l1 10.14 9.67 10.14 10.42
Chchd6 11.25 10.74 10.80 11.07
Arih1 9.91 9.25 10.20 9.34
Prpf8 11.54 11.58 11.14 11.36
Rfng 11.71 11.56 10.81 10.72
Rnf114 12.66 12.60 12.59 12.56
I want to conduct the Fisher-Yates shuffle on this data set 10 times (i.e. write 10 output files, each one with one randomisation of the data using the Fisher Yates shuffle).
I wrote this code:
import sys
import itertools
from itertools import permutations
for line in open(sys.argv[1]).readlines()[2:]:
line = line.strip().split()
ID = line[0]
expression_values = line[1:]
for shuffle in permutations(expression_values):
print shuffle
The output for this code is like this (sample):
('11.25', '10.74', '10.80', '11.07')
('11.25', '10.74', '11.07', '10.80')
('11.25', '10.80', '10.74', '11.07')
('11.25', '10.80', '11.07', '10.74')
('11.25', '11.07', '10.74', '10.80')
('11.25', '11.07', '10.80', '10.74')
('10.74', '11.25', '10.80', '11.07')
('10.74', '11.25', '11.07', '10.80')
('10.74', '10.80', '11.25', '11.07')
('10.74', '10.80', '11.07', '11.25')
('10.74', '11.07', '11.25', '10.80')
('10.74', '11.07', '10.80', '11.25')
('10.80', '11.25', '10.74', '11.07')
('10.80', '11.25', '11.07', '10.74')
('10.80', '10.74', '11.25', '11.07')
('10.80', '10.74', '11.07', '11.25')
('10.80', '11.07', '11.25', '10.74')
('10.80', '11.07', '10.74', '11.25')
('11.07', '11.25', '10.74', '10.80')
('11.07', '11.25', '10.80', '10.74')
('11.07', '10.74', '11.25', '10.80')
('11.07', '10.74', '10.80', '11.25')
('11.07', '10.80', '11.25', '10.74')
('11.07', '10.80', '10.74', '11.25')
('9.91', '9.25', '10.20', '9.34')
('9.91', '9.25', '9.34', '10.20')
The specific part that I have trouble with is producing the blocks of randomised data (e.g. give me a block of 7 Fisher-Yates randomised lines that I can write to file). If someone could show me how to edit the above code to produce 10 output files, each file containing 7 lines of text (i.e. the same number as the input file), each file with a randomised Fisher Yates shuffled set of values, i would appreciate it.
Edit 1: I have tried a few different ways: For example this code:
for line in open(sys.argv[1]).readlines()[2:]:
line = line.strip().split()
gene_name = line[0]
expression_values = line[1:]
RandomList = []
for shuffle in permutations(expression_values):
while len(RandomList) <10:
RandomList.append(shuffle)
print RandomList
I thought would give me back 10 randomisations of each line. It gives me back the same randomised line, 10 times, per line:
[('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07')]
[('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34')]
[('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36')]
[('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72')]
[('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56')]
Edit 2: Shaun: Thank you so much for your help, so I actually know how to write to files generally, e.g. I could say:
for i in range(10):
output_file = "random." + str(i)
open_output_file = open(output_file, 'a')
***for each line of the randomised array***:
open_output_file.write(line + "\n")
open_output_file.close()
My problem with writing to file is that I can't even get what I want to print to screen first, for example if I run this code:
import sys
import itertools
from itertools import permutations
for i in range(10):
for line in open(sys.argv[1]).readlines()[2:]:
line = line.strip().split()
gene_name = line[0]
expression_values = line[1:]
for shuffle in permutations(expression_values):
print shuffle[:6]
print "***"
i +=1
I would want the output to be something like 7 randomised lines, followed by "***", followed by 7 randomised lines, 10 times.But it prints all of the combinations of each line.