python choose randomly from each line of a file

Question

I want to choose randomly 3 elements from each line of my text file:

6717108 8373270 8670842 8671024 8671040 
8671069 8672185 
8672302 8672317 8672363 8672481 8672533 8672550 8672587 
8672610 
8672611 8672640 8672661 8672684 8688747 8688760 8688777 8688792 8688827 
8688836 8688884 8689003 8689037 
8672233 8688891 8688908 
8688971 8689078

However, I don't have always 3 elements in each line, in this case it should take all of them. So the output will be like, taken randomly:

6717108 8670842 8671040 
8671069 8672185 
8672317 8672481 8672533
8672610 
8672611 8688747 8688760 
8688836 8689003 8689037 
8672233 8688891 8688908 
8688971 8689078

My attempt is the following:

random_list = []
with open('my_inputFile', "r") as myFile:
    for line in myFile.readlines():
        myparts = line.split(' ')
        random_list.append(np.random.choice(myparts, 3))

the output format will be in list form:

Question is:

my code doesn't fulfill the condition when it is less than 3 elements in each line, and it is not apparently in list format.

after you initialize `myparts` throw in an `if` statement. `if len(myparts) <= 3: random_list.append(myparts) else: rest of your code` — gold_cy, Jun 02 '17 at 14:34
`random_list.append(np.random.choice(myparts, min(len(myparts), 3)))` — Phylogenesis, Jun 02 '17 at 14:36
`np.random.choice()` returns multiple independent choices from the population, so it can contain repeated elements. I think you want the `random.sample()` from the built-in `random` module instead. — Sven Marnach, Jun 02 '17 at 14:39

alexis · Answer 1 · 2017-06-02T15:10:39.503

You only need to draw randomly if you have more than three elements on a line. But also your code samples "with replacement", meaning it's possible to draw the same value twice (or even three times). So add the flag replace=False, or better yet just use random.sample(); you don't need numpy for this.

for line in myFile.readlines():
    myparts = line.split()
    if len(myparts) > 3:
        random_list.append(random.sample(myparts, 3))
    else:
        random_list.append(myparts)

It's not clear what your problem is with the output, but the random_list you create this way is a list of lists. You can print it out however you want.

Psidom · Accepted Answer · 2017-06-02T14:46:31.370

2

You need extend method if you want a flat list at the end and also specify replace=False if you don't want to choose one item more than once:

random_list = []

with open('pathToFile/inputFile', 'r') as f:
    for line in f.readlines():
        myparts = line.strip().split(' ')
        if len(myparts) <= 3:
            random_list.extend(myparts)
        else:
            random_list.extend(np.random.choice(myparts, 3, replace=False))

random_list
['8670842',
 '6717108',
 '8671024',
 '8671069',
 '8672185',
 '8672363',
 '8672317',
 '8672587',
 '8672610',
 '8688827',
 '8672661',
 '8688792',
 '8688884',
 '8689037',
 '8689003',
 '8672233',
 '8688891',
 '8688908',
 '8688971',
 '8689078']

edited Jun 02 '17 at 14:46

answered Jun 02 '17 at 14:41

Psidom

209,562
33
339
356

how can you save it to a file so that i can iterate over it later, checking if my number is in this list? – passion Jun 02 '17 at 14:42
Write a list to a file is pretty trivial, you can see [here](https://stackoverflow.com/questions/899103/writing-a-list-to-a-file-with-python). – Psidom Jun 02 '17 at 14:44

Gnudiff · Answer 3 · 2017-06-03T09:48:07.497

1

import random

random_list = []
with open('my_inputFile', "r") as myFile:
    for line in myFile.readlines():
        myparts = line.split(' ')
        l=len(myparts)
        if l< 4:
           random_list+=myparts
        else:
           for i in range(0,3):
               random_list+=[ myparts.pop(random.randrange(l))]
               l-=1

should do it.

edited Jun 03 '17 at 09:48

answered Jun 02 '17 at 14:40

Gnudiff

4,297
1
24
25

You're sampling "with replacement", i.e. you'll often draw the same number twice. – alexis Jun 02 '17 at 14:45
No, I won't because I am popping off the items from the list. – Gnudiff Jun 02 '17 at 15:23
Oh I see. You're right about that, but's a pretty convoluted and inefficient way to do this. (Also why do you keep taking the `len()`? You know how long it's going to be.) – alexis Jun 03 '17 at 08:37
You're right. this can be improved, for example, like this. In general, however, since we are talking about just 3 operations, I don't think performance hit would be important. But I upvoted the accepted answer, since it seemed better. – Gnudiff Jun 03 '17 at 09:49
I'm not worried about micro-optimization, but it's just not a good way to go about it. It's three operations **per row,** we don't know how many rows there are, and you mangle the input data (which the OP might have more uses for). – alexis Jun 03 '17 at 15:31
true as well :) – Gnudiff Jun 03 '17 at 18:43

python choose randomly from each line of a file

Question is:

3 Answers3