I have a list with 155k
files. When I random.sample(list, 100)
, while the results are not the same from the previous sample, they look similar.
Is there a better alternative to random.sample
that returns a new list of random 100 files?
folders = get_all_folders('/data/gazette-txt-files')
# get all files from all folders
def get_all_files():
files = []
for folder in folders:
files.append(glob.glob("/data/gazette-txt-files/" + folder + "/*.txt"))
# convert 2D list into 1D
formatted_list = []
for file in files:
for f in file:
formatted_list.append(f)
# 200 random text files
return random.sample(formatted_list, 200)