0

I'm getting an error that I can't understand... The output looks like this:

Renaming chr1/tests/test-ch1.py to chr1/tests/test-ch1_ref.py...
Traceback (most recent call last):
  File "/Users/eomiso/Workspace/miso-projects/scripts/append_ref_to_filename.py", line 18, in <module>
    os.rename(old_path, new_path)
ValueError: embedded null byte

Code:

import os
import re

# Define the path to the directory containing the files
target_dir = "chr1/tests"
suffix = "_ref"

# Compile a regular expression to match file extensions
ext_pattern = re.compile(r"\.[^.]+$")

for root, dirs, files in os.walk(target_dir):
    for filename in files:
        if ext_pattern.search(filename) and not filename.endswith(suffix):
            old_path = os.path.join(root, filename)
            ext = ext_pattern.search(old_path).group()
            new_path = os.path.join(root, ext_pattern.sub(f"{suffix}\\0", filename)) + ext
            print(f"Renaming {old_path} to {new_path}...")
            os.rename(old_path.strip(), new_path.strip())

The code is a simple script that adds a suffix to all files. I can't seem to understand why on earth there is could be a null character in the new_path? I have the file created already... And since the output is well printed just before the os.rename function, I can't see why the null byte (or character) is mingled into the new_path.

A bit of more context: before adding .encode("utf-8").strip() to the code, I got ValueError: embedded null byte` error.

Aesop
  • 151
  • 1
  • 7
  • 1
    Why are you using a regexp to match file extensions? You can use `os.path.splitext()` to separate the filename into a basename and extension. – Barmar Apr 04 '23 at 17:19
  • 1
    Why are you using `.strip()`? If the filename ends with spaces, you won't be able to rename it if you remove them. – Barmar Apr 04 '23 at 17:21
  • 1
    The replacement pattern `f"{suffix}\\0"` ends with a null character (i.e. zero byte). This doesn't show up directly in the output of a print statement. But try `print(len(new_path))` and `print(ord(new_path[-1]))` and you'll see the signs of its presence. – slothrop Apr 04 '23 at 17:22
  • .strip() was just to see if there is any null character was appended at the end of the string for some reason, which I was skeptical that it would be true. (And I didn't know that a whitespace at the end of the file is possible. I had a look in to [this post](https://stackoverflow.com/questions/2742821/blank-space-after-file-extension-weird-fileinfo-behaviour) Thanks. – Aesop Apr 04 '23 at 17:31
  • And yes I know that simply separating the filename with a dot and getting the last eelement would work as well. I just chose to use a regex to refresh my memory and found this weird behavior. Just trying to dig into it. – Aesop Apr 04 '23 at 17:31
  • 1
    Here we don't have a blank space (character 0x20), we have a null 0x00. – slothrop Apr 04 '23 at 17:32
  • 1
    @Aesop what's the motivation for the `\\0` at the end of the replacement expression in `.sub`? Python strings aren't null-terminated: https://stackoverflow.com/questions/24409581/do-python-strings-end-in-a-terminating-null – slothrop Apr 04 '23 at 17:36
  • @slothrop @Barmar Thank you for reaching out to help! The intention was to use the backreference to the entire match. And I've just read the documentation, and saw that I had it completely wrong. If I was to use an shorter version of backreference, I shouldn't have used `\\ `. It should have been `\1-99 `. And also, if I'm to use the number 0 for the whole matching group, I should use `\g<0> ` since, ` \0 ` adds a null byte to the string. – Aesop Apr 04 '23 at 17:46
  • But I didn't know that "\\0" and "\0" both represents a null byte. From my short understanding, shouldn't the previous one resort to 2 chacacters "\"+"0"? And when I print("\\0") I'm getting the expected printed output "\0". – Aesop Apr 04 '23 at 17:55
  • 1
    @Aesop `print("\\0")` only involves one cycle of unescaping. `re.sub(f"{suffix}\\0...")` involves two cycles of unescaping. First the f-string is interpreted, producing a string that ends in backslash followed by 0. Then when that string is given to re.sub, the function processes those last two characters \0 to produce a zero byte. (This isn't specific to f-strings, a plain string literal would do the same.) – slothrop Apr 04 '23 at 18:36
  • Relevant: https://stackoverflow.com/a/55810892/765091 – slothrop Apr 10 '23 at 12:34

0 Answers0