0

How do I split my folder containing multiple video files into train and test folders based on dataframe variables that tell me the which video should be in the train folder and which video should be in the test folder? (in Python 3.0). In which multiple videos are located in separate category folders

Each of the videos can be found in for instance the following category directories:

C:\Users\Me\Videos\a
C:\Users\Me\Videos\b

Which means that for every category I need a "train" and "test" folder like:

C:\Users\Me\Videos\a\train
C:\Users\Me\Videos\a\test

While I have an (EDIT) csv-file containing the following information. Thus, I dont want my train and split to be random, but based on the binary code in my sheet.

videoname |test|train|category|
-------------------------------
video1.mp4| 1  |0    |a       |
video2.mp4| 1  |0    |b       |
video3.mp4| 1  |0    |c       |
video4.mp4| 0  |1    |c       |

Can anyone point me in the direction of how I can use the file to do this for me? Can I somehow put the file in a dataframe which tells Python where to move the files?

EDIT:

import os
import csv
from collections import defaultdict

videoroot = r'H:\Desktop'
transferrable_data = defaultdict(list)
with open(r'H:\Desktop\SVW.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        video_path_source = os.path.join(videoroot, row['Genre'], row['FileName'])
        if (row['Train 1?'] == 0):
            split_type = 'test'
        else:
            split_type = 'train'
        video_destination_path = os.path.join(videoroot, row['Genre'], split_type, row['FileName'])
        transferrable_data[video_path_source].append(video_destination_path)
Sparkiepandas
  • 129
  • 1
  • 8
  • Make sure your code compiles. VIDEO_ROOT_FOLDER = 'C:\Users\Me\Videos' (needed to close the string), transferrable_data = defaultdict(list) (not csv.DictReader), the if(row[11]) should be replaced by if(row[1]). The conditions should be nested inside the for loop. Could you apply this changes and tell me if your problem is solved ? – madjaoue Jun 28 '18 at 15:46
  • I changed the code, and it compiles with no errors showing up. If I change the rows from numbers back to your "category' names, it gives an error saying: "TypeError: string indices must be integers". I can see that everything works the appending and stuff, but it doesnt find the filename, it returns the number 9 for some reason, which I cannot trace back in my csv-file. Also I think the 11 is right because it is the 11 "column" when opening the csv in excel. Btw I also added: from collections import defaultdict, but I think that is what I needed to do. Thanks for helping me the beginner! – Sparkiepandas Jun 28 '18 at 18:53
  • Is it possible that my numbers in: row[1] are seen as character locations? I thought (for some reason) that it referred to the 1st variable? Though when changing it back to your row['videoname'] it does not seem to work? – Sparkiepandas Jun 28 '18 at 19:11
  • If you want to use row['videoname'], your code should be more like : with open(r'C:\Users\Me\file.csv') as csvfile: reader = csv.DictReader(csvfile) for row in reader: etc. – madjaoue Jun 28 '18 at 21:39
  • Okay Mium, I think I got it. The thing I did wrong was excluding transferrable_data, because I thought the DictReader would store my dictionary. Thanks for the help so much! – Sparkiepandas Jun 29 '18 at 08:13
  • You're welcome. In this case you can mark this post as solved ;) – madjaoue Jun 29 '18 at 08:41
  • Hi Mium, I celebrated to early haha. For some reason it is only returning the last filename(the last one in the csv-file) . I have the feeling I am doing something wrong in the loop with indentations or the right locations for parts. Can you check my code as I edited again above. Don't mind the names of the columns, they are like that in the file. – Sparkiepandas Jun 29 '18 at 08:58
  • This should construct your tansferrable_data, what do you get when you print transferrable_data at the end of the loop ? – madjaoue Jun 29 '18 at 09:39
  • Hmmm strangee, it seems to work now. On another computer it kept crashing... Thanks! – Sparkiepandas Jun 29 '18 at 10:00
  • Sure ? :D If it's working, you can mark my answer as solution for your post in order to close it. – madjaoue Jun 29 '18 at 10:15
  • Hmm still an error when trying to do the actual copying. Transferrable data looks like this: defaultdict(list, {'C:\\Users\\ME\\Videos\\archery\\10157___9040036fdf2c4e8296c38232cf30c6f4.mp4': ['C:\\Users\\ME\\Videos\\archery\\train\\10157___9040036fdf2c4e8296c38232cf30c6f4.mp4'], (etc.). Which looks like it supposed to be. Although, when performing the copy command it gives the following error, stating it does not accept lists: TypeError: stat: path should be string, bytes, os.PathLike or integer, not list – Sparkiepandas Jun 29 '18 at 12:48
  • Oh my bad. I updated my code (last bit of code). Try it, it should work. – madjaoue Jun 29 '18 at 13:00
  • It works!!!!!!! :) – Sparkiepandas Jun 29 '18 at 13:37
  • Congrats !!! :) – madjaoue Jun 29 '18 at 14:11

1 Answers1

1

Well the first thing to do is to read your excel and construct a mapping from source file to destination folders :

VIDEO_ROOT_FOLDER = 'C:\Users\Me\Videos'
transferrable_data = defaultdict(list)
for row in excel_iteratable:
    video_source_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], row['videoname'])
    if (row['test'] == 1):
        split_type = 'test'
    else:  # I suppose you can only dispatch to test or train in a row
        split_type = 'train'
    video_destination_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], split_type, row['videoname'])) 
    transferrable_data[video_path_source].append(video_destination_path)

then you can write a script where you move your files to the correct paths, using one of the two following methods :

import os
os.rename("path/to/current/video", "path/to/destination/folder")

or if you need to copy (you don't want to alter your video folder) :

from shutil import copyfile
copyfile("path/to/current/video", "path/to/destination/folder")

Let's say for example that your mapping is :

transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}

you can do something like:

from shutil import copyfile
transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}
for src, destination_list in transferrable_data.items():
    for dest in destination_list:
        copyfile(src, dest)
madjaoue
  • 5,104
  • 2
  • 19
  • 31
  • Hi Thanks for your quick reply Mium. I have to say that I made a mistake I am not using an excel file but I am using a csv file. I am got it working for one subcategory now I am trying to get it working for all categories! – Sparkiepandas Jun 28 '18 at 07:37
  • In that case I think this method would still work. To read your csv easily, you can use csv.DictReader : https://docs.python.org/2/library/csv.html#csv.DictReader Good luck ! – madjaoue Jun 28 '18 at 08:48
  • Thanks again Mium. I think I am getting there I do have some trouble with the last line of your mapping code block. Is it possible that there is something wrong with the brackets? Also when printing for instance video_destination_path, I am only seeing one path? I have added the code that I used to my comment. Sparkiepandas – Sparkiepandas Jun 28 '18 at 15:05