2

I have one folder, called train, which contains two subfolders hot_dog and not_hot_dog. Out of the first folder train I would like to randomly select images from both subfolders hot_dog and not_hot_dog and put them into a new folder, called validation. This validation folder has again two subfolders, called hot_dog and not_hot_dog. The amount of randomly selected images should be approx. 20% of the original amount of images. Those images which were randomly selected and saved under the new folder validation, should be deleted in the original folder train.

Current folder structure looks like this:

current folder structure

The end result of the folder structure should look like this:

target folder structure

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Lexal
  • 35
  • 4
  • Hi Lexal, welcome to Stack Overflow. You tagged this question with [R]. Is that because you intended to perform this task with the R statistical programing language? If so, have to tried any code yet? If the answer is yes, it will be easier to help working off your code. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. – Ian Campbell Apr 08 '20 at 15:02
  • Hey - yes. My intention is to run this code in [R]. As I am quite a beginner with coding, I could not really produce any code, which would be useful in this forum. Why did you remove R in my header? I do not want to get a Python answer. Thanks. – Lexal Apr 08 '20 at 15:12

3 Answers3

1

Here is a base R approach that may work for you. It is not what I would call elegant, but it's relatively easy to understand.

Be sure to replace ~/Stack Overflow/ with whatever directory your train directory is located in.

In short, we use dir.create to make the new directories (if they do not exist already). Then we use list.files to make a list of the files in each of the two training directories. Then we use sample to take a sample of those files. Lastly we use file.copy to place them into their new home.

setwd("~/Stack Overflow/")
sample.fraction <- 0.2
train.true.dir <- "train/hot_dog"
train.false.dir <- "train/not_hot_dog"
valid.true.dir <- "validation/hot_dog"
valid.false.dir <- "validation/not_hot_dog"
sapply(c("validation",valid.true.dir,valid.false.dir),function(x){dir.create(x,showWarnings = FALSE)})
true.files <- list.files(train.true.dir)
false.files <- list.files(train.false.dir)
true.sample <- sample(true.files,size = ceiling(length(true.files) * sample.fraction))
false.sample <- sample(false.files,size = ceiling(length(false.files) * sample.fraction))
sapply(true.sample,function(x){file.copy(paste(train.true.dir,x,sep="/"),paste(valid.true.dir,x,sep="/"))})
sapply(false.sample,function(x){file.copy(paste(train.false.dir,x,sep="/"),paste(valid.false.dir,x,sep="/"))})

If you wanted to remove those files afterwards, you could use these two lines.

Please make a backup first.

sapply(true.sample,function(x){file.remove(paste(train.true.dir,x,sep="/"))})
sapply(false.sample,function(x){file.remove(paste(train.false.dir,x,sep="/"))})

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Unbelievable...that was my first stack overflow question and I get a (correct!!!) answer within 20 min. You made my day!!! – Lexal Apr 08 '20 at 15:30
  • One more issue I figured out: With the code I was able to move 20% of the original files to the new folder "validation". Unfortunately those 20% pictures still stayed in the folder "train". My intention was, that I more or less shift those 20% from the folder "train" to the new folder "validation". – Lexal Apr 08 '20 at 15:47
  • I edited my answer with a way to remove those files. Please be sure to make a backup. – Ian Campbell Apr 08 '20 at 15:57
  • Really nice. Now it works as expected. Great. Thanks a lot for your help! – Lexal Apr 08 '20 at 16:01
1

At first, set the current working directory to the path where the folder train is located by setwd(). Then run the following code:

# setwd("path/to/folder/train")
path1 <- file.path("train", c("hot_dog", "not_hot_dog"))
path2 <- file.path("validation", c("hot_dog", "not_hot_dog"))
dir.create("validation")
lapply(path2, dir.create)
Map(function(x, y){
  file <- dir(x) ; n <- length(file)
  file_selected <- file.path(x, sample(file, ceiling(n * 0.2)))
  file.copy(file_selected, y)
  file.remove(file_selected)
}, path1, path2)

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • This was also a really good solution. I tried it out and it worked. Unfortunately also here the files that I moved to the new folder "validation", still exist in the old folder "train". I have to have them deleted in the old folder "train". – Lexal Apr 08 '20 at 16:00
  • @Lexal If any answers have solved your problem, please mark the preferable one as "accepted". Read [this page](https://meta.stackexchange.com/a/5235/412699) for more details about accepting an answer. Thank you! – Darren Tsai Apr 08 '20 at 16:16
0

This is the final code which I then used:

path2 <- file.path(here(), "data/hot-dog-not-hot-dog/validation", c("hot_dog", "not_hot_dog"))
dir.create(file.path(here(), "data/hot-dog-not-hot-dog/validation"))
lapply(path2, dir.create)
Map(function(x, y){
  file <- dir(x) ; n <- length(file)
  file_selected <- file.path(x, sample(file, ceiling(n * 0.2)))
  file.copy(file_selected, y)
  file.remove(file_selected)
}, path1, path2)```
Lexal
  • 35
  • 4