36

I'm facing this simple task but, I'm also wondering about what's the easiest and short way to do it.

My proposal is move a given numer of random files from a directory to another. This task is part of the creation of two datasets I need for machine learning: a training set and a testing set. My goal is move away 10% of the file from a directory in order to get the datasat agaist which I could test my categorizer, and obtain a training set from the source directory.

So, what's the most compact typing for this "move n random files" task?

Thanks in advance - as usual -

Max
  • 2,508
  • 3
  • 26
  • 44
  • 1
    If your files have a uniform, evenly distrubuted numeric file name pattern, `mv *0 otherdir` would move 10% -- it's not random, but it's evenly distributed, which may even be closer to what you actually want; and, it's easy to document and trivial to reproduce. – tripleee Dec 25 '12 at 19:14
  • 1
    Unfortunately it's not my case, but your suggestion is really good. – Max Dec 25 '12 at 19:28

8 Answers8

67

Use a combination of shuf and xargs (it's a good idea to look at their documentation with man):

shuf -n 10 -e * | xargs -i mv {} path-to-new-folder

The command above selects 10 random files of the current folder (the * part) and then move them to the new folder.

Update

Although longer, one might find this version even simpler to understand:

ls | shuf -n 10 | xargs -i mv {} path-to-new-folder

shuf just generates a random permutation of the standard input, limiting the results to 10 (like using head, but probably faster).

boechat107
  • 1,654
  • 14
  • 24
3

You could use bash random generator that generates an int between 0 and 32767 to choose if a file must be put in set1 or set2. That would do:

for file in ./*; do
  val=$RANDOM
  if test $val -gt 3276; then
    mv "$file" ../set1
  else
    mv "$file" ../set2
  fi
done
jfg956
  • 16,077
  • 4
  • 26
  • 34
2

You can shuffle the file list using shuf or sort -R. But you still need to take a subset, which you can do with head/tail.

reivilos
  • 31
  • 1
2

The question is pretty old, but for the record this works on OSX.

You have to install gshuf with brew install coreutils, then use:

tenpercent=$((`ls | wc -l` * 10/100))

ls | gshuf -n $tenpercent | xargs -I {} mv {} destination/path/
Ettore Rizza
  • 2,800
  • 2
  • 11
  • 23
2

You can also do this with Python. I find this easier.

Here is a python script I use to move a random percent of images that also gets associated label datasets typically required for CV image datasets. Note this moves the files because I do not want my test training dataset in my training dataset.

I use the below for Yolo training sets as labels and images are in the same directory and the labels are txt files.

import numpy as np
import os
import random

#set directories
directory = str('/MauiData/maui_complete_sf_train')
target_directory = str('/MauiData/maui_complete_sf_test')
data_set_percent_size = float(0.07)

#print(os.listdir(directory))

# list all files in dir that are an image
files = [f for f in os.listdir(directory) if f.endswith('.jpg')]

#print(files)

# select a percent of the files randomly 
random_files = random.sample(files, int(len(files)*data_set_percent_size))
#random_files = np.random.choice(files, int(len(files)*data_set_percent_size))

#print(random_files)

# move the randomly selected images by renaming directory 

for random_file_name in random_files:      
    #print(directory+'/'+random_file_name)
    #print(target_directory+'/'+random_file_name)
    os.rename(directory+'/'+random_file_name, target_directory+'/'+random_file_name)
    continue

# move the relevant labels for the randomly selected images

for image_labels in random_files:
    # strip extension and add .txt to find corellating label file then rename directory. 
    os.rename(directory+'/'+(os.path.splitext(image_labels)[0]+'.txt'), target_directory+'/'+(os.path.splitext(image_labels)[0]+'.txt'))

    continue
Pullmyleg
  • 21
  • 4
2

Alternative version with find to avoid problems with folders. It copies 31415 randomly chosen files into /home/user/dir/

find . -maxdepth 1 -type f | sort -R | head -31415 | xargs cp -t /home/user/dir/
Simone Aonzo
  • 841
  • 10
  • 21
  • This copies files instead of moving them, so according to the user's needs, this would not separate the dataset to a training and testing set, but correct me if I'm wrong – Jack Avante Mar 05 '23 at 18:20
0

Typically, we do this with a python script or a Java program. Either of these can use a proper RNG to make the random decisions, and then invoke the necessary calls to move files around.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • Interesting, so you're suggesting to use a RNG... of course, moving files giving a limit to xarg would not end in a proper random selection. I'll think about it – Max Dec 25 '12 at 18:56
  • How can you possibly achieve 'random' without an RNG? – bmargulies Dec 25 '12 at 18:57
  • Of course, It's not possible. I was just thinking about taking some files from a directory making something like ls | head -20. I suppose that repeating this task hundreds of time agaist the same directory containing hundreds of file, will not be giving always the same results. Am I wrong maybe? – Max Dec 25 '12 at 19:14
  • Would it be considered as good advice using bash random num generator? - you can start from choosing random num of files to be moved, than - you can list files to a tmp.file - and read random lines to get files and move them; `#!/bin/bash #Get the range file_list_no=`ls -1 | wc -l` number=$RANDOM let "number %= $file_list_no" echo "Chosing random number of files to be moved: $number of $file_list_no"` – java_xof Dec 25 '12 at 20:26
  • Wow, I didn't know about bash' random generator. Anyway I need to get a given number of random files. – Max Dec 25 '12 at 22:08
  • ls | head will always be in the same order. – bmargulies Dec 26 '12 at 02:09
0

With the zsh shell:

mv -- *(oe['REPLY=$RANDOM'][1,10]) /path/to/destination/

Where we use the oe glob qualifier to order the glob expansion based on the evaluation of the given code (which here returns a random value), and select the first 10.

On recent GNU systems, and with a shell with support for ksh-style process substitution (ksh93, zsh, bash) you can do:

xargs -r0a <(ls -U --zero | shuf -zn10) mv -t /path/to/destination --

ls -U --zero can be replaced with printf '%s\0' * is GNU ls is too old to support --zero. With the difference that if there's no non-hidden file in the current directory, you'll get an error about the failure to move a file called *.

xargs -r0a <(shuf -zen10 -- *) mv -t /path/to/destination --

Could also be used as a more correct/reliable/efficient variation on @boechat107's answer, though with the added caveat that you may run into a Argument list too long error upon attempting to execute shuf if there's a large number of non-hidden files in the current working directory. The printf-based approach above should be fine in that regard as printf is generally built in the shells and so not affected by that limitation of the execve() system call.

To move 10% as opposed to 10, with zsh:

files=(*(Noe['REPLY=$RANDOM']))
mv -- $files[1,$#files/10] /path/to/destination/

(10% rounded down).

Stephane Chazelas
  • 5,859
  • 2
  • 34
  • 31