-1

I have the following data set (this is a sample):

ID      Sub1    Sub2    Sub3    Sub4
Creb3l1 10.14   9.67    10.14   10.42
Chchd6  11.25   10.74   10.80   11.07
Arih1   9.91    9.25    10.20   9.34
Prpf8   11.54   11.58   11.14   11.36
Rfng    11.71   11.56   10.81   10.72
Rnf114  12.66   12.60   12.59   12.56

I want to conduct the Fisher-Yates shuffle on this data set 10 times (i.e. write 10 output files, each one with one randomisation of the data using the Fisher Yates shuffle).

I wrote this code:

import sys
import itertools
from itertools import permutations

for line in open(sys.argv[1]).readlines()[2:]:
    line = line.strip().split()
    ID = line[0]
    expression_values = line[1:]
    for shuffle in permutations(expression_values):
        print shuffle

The output for this code is like this (sample):

('11.25', '10.74', '10.80', '11.07')
('11.25', '10.74', '11.07', '10.80')
('11.25', '10.80', '10.74', '11.07')
('11.25', '10.80', '11.07', '10.74')
('11.25', '11.07', '10.74', '10.80')
('11.25', '11.07', '10.80', '10.74')
('10.74', '11.25', '10.80', '11.07')
('10.74', '11.25', '11.07', '10.80')
('10.74', '10.80', '11.25', '11.07')
('10.74', '10.80', '11.07', '11.25')
('10.74', '11.07', '11.25', '10.80')
('10.74', '11.07', '10.80', '11.25')
('10.80', '11.25', '10.74', '11.07')
('10.80', '11.25', '11.07', '10.74')
('10.80', '10.74', '11.25', '11.07')
('10.80', '10.74', '11.07', '11.25')
('10.80', '11.07', '11.25', '10.74')
('10.80', '11.07', '10.74', '11.25')
('11.07', '11.25', '10.74', '10.80')
('11.07', '11.25', '10.80', '10.74')
('11.07', '10.74', '11.25', '10.80')
('11.07', '10.74', '10.80', '11.25')
('11.07', '10.80', '11.25', '10.74')
('11.07', '10.80', '10.74', '11.25')
('9.91', '9.25', '10.20', '9.34')
('9.91', '9.25', '9.34', '10.20')

The specific part that I have trouble with is producing the blocks of randomised data (e.g. give me a block of 7 Fisher-Yates randomised lines that I can write to file). If someone could show me how to edit the above code to produce 10 output files, each file containing 7 lines of text (i.e. the same number as the input file), each file with a randomised Fisher Yates shuffled set of values, i would appreciate it.

Edit 1: I have tried a few different ways: For example this code:

for line in open(sys.argv[1]).readlines()[2:]:
    line = line.strip().split()
    gene_name = line[0]
    expression_values = line[1:]
    RandomList = []
    for shuffle in permutations(expression_values):
        while len(RandomList) <10:                                                                                                                                                                
            RandomList.append(shuffle)                                                                                                                                                            
    print RandomList                                                                                                                                                                                

I thought would give me back 10 randomisations of each line. It gives me back the same randomised line, 10 times, per line:

[('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07')]
[('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34')]
[('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36')]
[('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72')]
[('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56')]

Edit 2: Shaun: Thank you so much for your help, so I actually know how to write to files generally, e.g. I could say:

for i in range(10):
   output_file = "random." + str(i)
   open_output_file = open(output_file, 'a')
   ***for each line of the randomised array***:
        open_output_file.write(line + "\n")
   open_output_file.close()

My problem with writing to file is that I can't even get what I want to print to screen first, for example if I run this code:

   import sys
   import itertools
   from itertools import permutations

   for i in range(10):
        for line in open(sys.argv[1]).readlines()[2:]:
            line = line.strip().split()
            gene_name = line[0]
            expression_values = line[1:]
            for shuffle in permutations(expression_values):
                print shuffle[:6]
            print "***"
    i +=1

I would want the output to be something like 7 randomised lines, followed by "***", followed by 7 randomised lines, 10 times.But it prints all of the combinations of each line.

user1288515
  • 195
  • 1
  • 10
  • Which part are you stuck on? Getting groups of seven? Writing them to a file? All of these things have answers. – jonrsharpe Dec 04 '16 at 17:36
  • Thank you, I've edited the question. Yes, the output I get is 120 lines printed to screen/written to file. I'm confused as to how to get groups of 7, e.g. to print out at a time one set, of 7 lines, to write to file (and then to do that 10 times). – user1288515 Dec 04 '16 at 17:40
  • And what have you tried? Making a list, perhaps? Acting when it reaches the appropriate length? If you've made an effort, show it. If you haven't, make one! Or just [do some research](http://stackoverflow.com/questions/3992735/python-generator-that-groups-another-iterable-into-groups-of-n). – jonrsharpe Dec 04 '16 at 17:41
  • Python code requires white space. It is all but unreadable in comments. [Edit] the question, or just read the link I just provided. – jonrsharpe Dec 04 '16 at 17:55

3 Answers3

0

I think I have a solution:

import sys
import itertools
from itertools import permutations
import os

#Write the header line to 10 random files
fileopen = open(sys.argv[1]).readlines()
for i in range(10):
     file_name = "random" + str(i) + ".txt"
     open_file_name = open(file_name, 'a')
     open_file_name.write(fileopen[0].strip() + "\n")

#Write the rest of the info to 10 random files
for line in fileopen:
     if "Sub" not in line:
          line = line.strip().split()
          ID = line[0]
          expression_values = line[1:]
          ListOfShuffles = permutations(expression_values)
          for ind,i in enumerate(list(ListOfShuffles)[0:10]):
               file_name = "random" + str(ind) + ".txt"
               open_file_name = open(file_name, 'a')
               open_file_name.write(ID + "\t" + "\t".join(i) + "\n")
user1288515
  • 195
  • 1
  • 10
0
import random

def shuffle(ary):
  a=len(ary)
  b=a-1
  for d in range(b,0,-1):
    e=random.randint(0,d)
    if e == d:
        continue
    ary[d],ary[e]=ary[e],ary[d]
  return ary

fisher-yates shuffler gets the random value from the list and put it on the first position. It will repeat it for the length of the given array. For Each iteration it will generate the random value in range(len(remaining_elemnets) , 0 ) and replace that variable with the first position.

Visit Here: http://code.activestate.com/recipes/360461-fisher-yates-shuffle/

Wolf
  • 9,679
  • 7
  • 62
  • 108
  • 2
    Please include some explanations as well? Links are prone to change or be removed. Future readers may not be able to access the information in the link you provided. – Ishita Sinha Oct 03 '18 at 10:10
  • ... as poor as the ActiveState "recipe" -- better read the explanations that Wikipedia provides https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle – Wolf Mar 20 '19 at 15:16
-1

"each file containing 7 lines of text"

Sounds like you want to do array slicing.

a = [ 1, 2, 3, 4, 5, 6 ]
a[:3]

will yield 1, 2, 3

Array slicing is done by indexing the start index, end index, and skip. In a[:3] the start index is skipped so it start at 0 to element 3.

a[1:3] would yield [2, 3]

a[1:5:2] would start at 1, end at 5, skipping 2. So it would yield [2, 4]

So in your example, it looks like you'd want to write shuffle[:6]

As for writing the file, you'll need some kind of loop

for i in range(0, 10): filename = "output-%s.txt" % i

That will produce filenames output-0.txt, output-1.txt, etc

Read https://docs.python.org/2/tutorial/inputoutput.html about file input/output. Basically you should use the with keyword and open

with open(filename, 'w') as f:
    f.write(str(shuffle[:7]))

This should get you in the right direction

Shaun
  • 3,777
  • 4
  • 25
  • 46