0

I want to choose randomly 3 elements from each line of my text file:

6717108 8373270 8670842 8671024 8671040 
8671069 8672185 
8672302 8672317 8672363 8672481 8672533 8672550 8672587 
8672610 
8672611 8672640 8672661 8672684 8688747 8688760 8688777 8688792 8688827 
8688836 8688884 8689003 8689037 
8672233 8688891 8688908 
8688971 8689078 

However, I don't have always 3 elements in each line, in this case it should take all of them. So the output will be like, taken randomly:

6717108 8670842 8671040 
8671069 8672185 
8672317 8672481 8672533
8672610 
8672611 8688747 8688760 
8688836 8689003 8689037 
8672233 8688891 8688908 
8688971 8689078 

My attempt is the following:

random_list = []
with open('my_inputFile', "r") as myFile:
    for line in myFile.readlines():
        myparts = line.split(' ')
        random_list.append(np.random.choice(myparts, 3))

the output format will be in list form:

6717108
8670842
8671040
8671069
8672185
8672317
8672481
8672533
8672610
8672611
8688747
8688760
8688836
8689003
8689037
8672233
8688891
8688908
8688971
8689078

Question is:

my code doesn't fulfill the condition when it is less than 3 elements in each line, and it is not apparently in list format.

passion
  • 1,000
  • 6
  • 20
  • 47

3 Answers3

4

You only need to draw randomly if you have more than three elements on a line. But also your code samples "with replacement", meaning it's possible to draw the same value twice (or even three times). So add the flag replace=False, or better yet just use random.sample(); you don't need numpy for this.

for line in myFile.readlines():
    myparts = line.split()
    if len(myparts) > 3:
        random_list.append(random.sample(myparts, 3))
    else:
        random_list.append(myparts)

It's not clear what your problem is with the output, but the random_list you create this way is a list of lists. You can print it out however you want.

alexis
  • 48,685
  • 16
  • 101
  • 161
2

You need extend method if you want a flat list at the end and also specify replace=False if you don't want to choose one item more than once:

random_list = []
​
with open('pathToFile/inputFile', 'r') as f:
    for line in f.readlines():
        myparts = line.strip().split(' ')
        if len(myparts) <= 3:
            random_list.extend(myparts)
        else:
            random_list.extend(np.random.choice(myparts, 3, replace=False))

random_list
['8670842',
 '6717108',
 '8671024',
 '8671069',
 '8672185',
 '8672363',
 '8672317',
 '8672587',
 '8672610',
 '8688827',
 '8672661',
 '8688792',
 '8688884',
 '8689037',
 '8689003',
 '8672233',
 '8688891',
 '8688908',
 '8688971',
 '8689078']
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • how can you save it to a file so that i can iterate over it later, checking if my number is in this list? – passion Jun 02 '17 at 14:42
  • Write a list to a file is pretty trivial, you can see [here](https://stackoverflow.com/questions/899103/writing-a-list-to-a-file-with-python). – Psidom Jun 02 '17 at 14:44
1
import random

random_list = []
with open('my_inputFile', "r") as myFile:
    for line in myFile.readlines():
        myparts = line.split(' ')
        l=len(myparts)
        if l< 4:
           random_list+=myparts
        else:
           for i in range(0,3):
               random_list+=[ myparts.pop(random.randrange(l))]
               l-=1

should do it.

Gnudiff
  • 4,297
  • 1
  • 24
  • 25
  • You're sampling "with replacement", i.e. you'll often draw the same number twice. – alexis Jun 02 '17 at 14:45
  • No, I won't because I am popping off the items from the list. – Gnudiff Jun 02 '17 at 15:23
  • Oh I see. You're right about that, but's a pretty convoluted and inefficient way to do this. (Also why do you keep taking the `len()`? You know how long it's going to be.) – alexis Jun 03 '17 at 08:37
  • You're right. this can be improved, for example, like this. In general, however, since we are talking about just 3 operations, I don't think performance hit would be important. But I upvoted the accepted answer, since it seemed better. – Gnudiff Jun 03 '17 at 09:49
  • I'm not worried about micro-optimization, but it's just not a good way to go about it. It's three operations **per row,** we don't know how many rows there are, and you mangle the input data (which the OP might have more uses for). – alexis Jun 03 '17 at 15:31
  • true as well :) – Gnudiff Jun 03 '17 at 18:43