0

I am trying to rename a series of pdfs from filenames like this: The New Town Cryer - 01 Oct 2020.pdf to this 2020-10-01_-_The_New_Town_Cryer.pdf. I've written a bash script that uses sed to accomplish this, but I'm having trouble figuring out how to convert the date from the current three letter month format using the date command. This is the line of my script so far (the previous newname variable is The New Town Cryer - 01 Oct 2020 pdf:

newname="$(echo "$newname" | sed -re 's/^(.*) - (.*) ([^ ]+)$/echo "$(date -d "\2" "+%Y-%m-%d")-\1".\3/')"

The output from this line is echo "$(date -d"01 Oct 2020" "+%Y-%m-%d")-The New Town Cryer".pdf, where I was hoping it would be 2020-10-01-The New Town Cryer.pdf

Can anyone tell me where I'm going wrong? Thanks!

Edit: to clarify here is my whole script so far, since it seems that my snippet was unclear. The original format of the filenames is The New Town Cryer - No. 1,032 [01 Oct 2020].pdf, which I am trying to convert to the format 2020-10-01_The_New_Town_Cryer.pdf.

#!/bin/bash

find "$1" "*.pdf" -type f -printf "%f\n" | while IFS= read -r f ; do #find all pdfs
  name=$f
  newname="$(echo "$name" | sed -re 's/\./ /g')" # replace .s with spaces to allow 'date'-command to parse the date
  newname="$(echo "$newname" | sed -re 's/\[/!/g')" # replace [s with spaces to allow 'date'-command to parse the date
  newname="$(echo "$newname" | sed -re 's/\]/!/g')" # replace [s with spaces to allow 'date'-command to parse the date
  newname="$(echo "$newname" | sed -re 's/(.*) - (.*) (!.*!)/\1\ - \3/')" # remove issue number
  newname="$(echo "$newname" | sed -re 's/\!//g')" # replace !s with spaces to allow 'date'-command to parse the date
  newname="$(echo "$newname" | sed -re 's/^(.*) - (.*) ([^ ]+)$/echo "$(date -d "\2" "+%Y-%m-%d")-\1".\3/')" # reorder the date and name, split at '-', keep the file extension, prepare for date conversion
  newname="$(echo "$newname" | bash )"
  newname="$(echo "$newname" | sed -re 's/ /./g')" # replace remaining spaces with .
  mv "$name" "$newname"
done
xthursdayx
  • 13
  • 5
  • I'm having a really hard time reading that code. Why do you have an `echo` command on the right-hand side of your `sed`'s replace expression? – Charles Duffy Oct 14 '20 at 17:15
  • (and how would anything actually _run_ `sed`'s output as a command, for that echo to be invoked?) – Charles Duffy Oct 14 '20 at 17:22
  • Please see the full script I added to the original question. – xthursdayx Oct 14 '20 at 18:31
  • The code added to the question is _extremely_ inefficient, and also has serious security bugs. Command substitutions are slow. External command invocations are slow. Piping generated code to `bash` is very hard to do securely. Don't do any of that. – Charles Duffy Oct 14 '20 at 20:19
  • Alright, thanks for letting me know. I'm still learning, so obviously make a lot of mistakes. I will update my code as per your suggestions below. Thanks again. – xthursdayx Oct 15 '20 at 16:45

2 Answers2

0

Using bash's native regex support instead of trying to (ab)use sed here makes the code -- while perhaps longer -- much clearer to read. As a solution you can see working at https://ideone.com/Suw9Ow:

oldname='The New Town Cryer - 01 Oct 2020.pdf'
date_re='(^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*)'
if [[ $oldname =~ $date_re ]]; then
  basename=${BASH_REMATCH[1]}
  day=${BASH_REMATCH[2]}
  month=${BASH_REMATCH[3]}
  year=${BASH_REMATCH[4]}
  ext=${BASH_REMATCH[5]}
  new_date=$(date -d "${day} ${month} ${year}" +%Y-%m-%d)
  newname="${new_date} - ${basename}${ext}"
  echo "Old name: $oldname"
  echo "New name: $newname"
fi
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Okay, it looks like I might be able to update my script entirely to rename the files using this bash's regex support. Would this work with a `find` command like the one in the script I added above? – xthursdayx Oct 14 '20 at 18:38
  • Consider `find "$1" -name "*.pdf" -type f -printf '%f\0' | while IFS= read -r -d '' oldname; do ...; done`, putting the code from this answer between the `do` and the `done`. – Charles Duffy Oct 14 '20 at 20:21
  • Note the change from `%s\n` to `%s\0` -- you can't safely store a list of arbitrary filenames in a newline-delimited list, because it's legal for filenames to contain newlines as part of their text; the NUL character is the only one that's universally guaranteed to not be present in a filename. – Charles Duffy Oct 14 '20 at 20:21
  • If it's a different problem that you're running into now, then start a new question. – Charles Duffy Oct 15 '20 at 16:48
  • Thanks for this advice @Charles Duffy. As you can see, I'm still learning how to do this. The only issue is that the code you posted is based on the filenames being in the format `The New Town Cryer - 01 Oct 2020.pdf`, since that is the format that would have been input into the line in my original question. However, that filename is the result of the first 4 `sed` commands in my script. The original filenames are in the format: `The New Town Cryer - No. 1,032 [01 Oct 2020].pdf`. From this I want to remove the issue number and rename to this format: `2020-10-01_The_New_Town_Cryer.pdf`. – xthursdayx Oct 15 '20 at 16:51
  • Ahh. While normally changing a question after it's answered in a way that invalidates that answer is against the rules, I don't mind helping with that in the comment threads in this particular case. Give me a minute to test. – Charles Duffy Oct 15 '20 at 16:57
  • Yeah, sorry for the confusion @Charles Duffy. I changed the things to try to make my question more legible to people, since I quickly realized that without the code my question was unclear. – xthursdayx Oct 16 '20 at 19:58
  • @xthursdayx, ...btw, I notice you haven't accepted any answer yet, so your question shows as still unsolved. Is there anything you'd expect to see additionally addressed for an answer to the original question to be considered complete? – Charles Duffy Oct 16 '20 at 22:01
  • I accepted it as your answer does indeed answer the original question. For some reason my script is not working, even though the two parts are working separately (part 1 being the find command and part 2 being the code from the ideone link you shared, but I guess I'll start a new question to troubleshoot that. Thanks again. – xthursdayx Oct 16 '20 at 23:32
  • @xthursdayx, ...if it helps to better write that new question, using `set -x` to enable debug tracing is helpful more often than not. – Charles Duffy Oct 16 '20 at 23:42
  • In case you're interested, here is the new question: https://stackoverflow.com/questions/64397717/using-find-and-bashs-regex-support-to-rename-pdf-files – xthursdayx Oct 16 '20 at 23:51
0

This might work for you (GNU sed):

sed -E 's/(.*) - (.*)\.(.*)/echo $(date -d "\2" "+%Y-%m-%d")-\1.\3/e' file

Match the file name and then use the e flag to evaluate the echo command.

potong
  • 55,640
  • 6
  • 51
  • 83
  • I really recommend against anyone _ever_ using the `e` sed flag, especially when a command contains references that expand to a `.*`'s contents; it's `eval`-equivalent, so it's easy to have security bugs when a value unexpectedly contains something that acts like shell syntax. – Charles Duffy Oct 14 '20 at 17:23
  • ...if you had an input file created by the command `touch $'Hello - $(rm -rf ~).pdf'`, you don't want `date -d "$(rm -rf ~)" "+%Y-%m-%d"` to be run. And while that's a malicious example, less-intentional ones can happen too. – Charles Duffy Oct 14 '20 at 17:25
  • @CharlesDuffy agreed - user beware – potong Oct 14 '20 at 17:27