1

I'm setting up a script that takes .jpg images from a folder named {number}.jpg and compares that number multiplied by the framerate to a range given by a csv file. The jpg is then copied into the same folder as the csv that contained the range it fit in.

So the csv data looks like:

477.01645354635303,1087.1628371628808
1191.5980219780615,1777.622457542435
1915.5956043956126,2525.6515684316387
2687.7457042956867,3299.803336663285
3429.317892107908,4053.6603896103848
4209.835924075932,4809.700129870082

(there are many files but this is one full example)

Each number would be compared to each of these ranges and placed in the corresponding folder. If I just print the target file and destination, everything works fine and the results are as expected. But if I try to use any of the shutil copy function (copy, copyfile, copy2) the loop is broken.

The file structure looks:
Data
|-Training
|--COMPRESSION (CPR)
|---COMPRESSION (CPR).csv
|---Where the image data would go
|--More folders..
|-Validation
|--Same as Training
|-Test
|--Same as Training

This is Python 3. I'm running VS Code on a Ubuntu (Pop!OS) machine. I've tried each of the different shutil copy functions that fit this case (copy, copy2, copyfile). I've tried copying to different folders and that works. If I copy the files to the parent folder (i.e. Training in the above hierarchy), instead of the sub-directories, it works fine. However I need them in the subdirectory for labeling purposes.

for cur in file_list:
    with open(cur, 'r') as img:
        filename = ntpath.basename(cur)
        frame_num = int(filename[:-4]) # get number from filename
        frame_num = (frame_num - 1) * (30000./1001.) # it's one second from each frame in a video
        training = get_folders(train_path)
        for folder in training:
            train_csvfile = get_files(train_path + folder)
            if len(train_csvfile) > 0:
                with open(train_csvfile[0], 'r', encoding='latin-1', newline='') as source:
                    train_reader = csv.reader(source, delimiter = ',')
                    for trdata in train_reader:
                        if frame_num > float(trdata[0]) and frame_num < float(trdata[1]):
                            tr_path = os.path.join(train_path + folder, ntpath.basename(cur))
                            copy2(cur,tr_path)
                            print('Copied {} to training folder {}.'.format(filename, tr_path))

Code for getting the files and folders:

def get_folders(a_dir):
    return [name for name in os.listdir(a_dir)
            if os.path.isdir(os.path.join(a_dir, name))]

def get_files(a_dir):
    a_dir = Path(a_dir)
    return [f for f in a_dir.glob('**/*') if f.is_file()]

file_list = get_files('/media/username/Seagate Expansion Drive/EXP 3/S1 C2/frames')

The full output is:

Copied 000017.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000017.jpg.
Copied 000018.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000018.jpg.
Copied 000019.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000019.jpg.
Copied 000021.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000021.jpg.
Traceback (most recent call last):
  File "tfinput.py", line 39, in <module>
    for trdata in train_reader:
_csv.Error: line contains NULL byte

The files are correctly copied as said (but ONLY those four out of hundreds)

The csv files are not altered at all in this script. The script gets through four images and crashes with the above error. It correctly places these four images. If I try to run the script again without regenerating the data, it crashes immediately. However, if I don't use the copy function, everything works fine and all of the correct input and output directories are given in my print statements. The script can also be rerun without regeneration when there is no copy statement. This makes me think there must be some kind of overwrite issue but since I don't actually edit the csv files I can't put my finger on it.

I expect that it should simply copy the files from the source to destination.

EDIT: I went ahead and printed the whole file it gets stuck on. And what it seems to do is read the first line and then get crash. I tested this on another file and confirmed it just copies the files within the first range and then crashes

EDIT 2: I was able to get this working by using a try-except block on the block starting with for trdata in train_reader: however it skipped a lot of entries

EDIT 3: For those curious, I never figured out the issue although I suspect it was an overwrite issue, as checking for NULL values without the copy statement came up with nothing. I refactored the code where I first created a text file of the folder and file name and then read that file and copied the files. That worked perfect.

Thank you for any help!!

  • The error indicates there is a null byte in one of the csv files. Have you tried to [catch the exception](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) and inspect/print file name, line number or other relevant info?. If you search for the exception message `_csv.Error: line contains NULL byte` there are a number of SO Q&A's maybe one is a duplicate. – wwii Jun 24 '19 at 16:15
  • Thanks @wwii this helped me out! I posted the solution I found down in the answers. Printing the filenames was the debugging step I didn't think of :) – Conner Pinson Jun 24 '19 at 16:31

1 Answers1

0

I don't think it's a problem with the copy. From the error message it looks like there's NULL byte in the CSV file that is being read. Write some print statements and observe that file.

You may find this helpful. "Line contains NULL byte" in CSV reader (Python)

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • That's what I originally thought but the script works if copy isn't used. I went ahead and printed the whole file it gets stuck on. And what it seems to do is read the first line and then get crash. I tested this on another file and confirmed it just gets the files within the first range and then crashes – Conner Pinson Jun 24 '19 at 16:21