0

As detailed here, I am trying troubleshoot a script I've been working on to rename a series of pdfs with filenames in this format: The New Town Cryer - No. 1,032 [01 Oct 2020].pdf to this 2020-10-01_-_The_New_Town_Cryer.pdf. The script uses find to find all PDFs in a given directory and then should rename them using a series of bash rematch commands. So far, the find part of my script is correctly finding the PDFs, and the renaming script also works on its own. However, when I combine the two parts I get no output.

Original Script:

#!/bin/bash

set -x 

find "$1" -name "*.pdf" -type f -printf '%f\0' | while IFS= read -r -d '' oldname; do #find all pdfs
  : oldname="$oldname"
  date_re='(^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*)'
  if [[ $oldname =~ $date_re ]]; then
    basename=${BASH_REMATCH[1]}
    day=${BASH_REMATCH[2]}
    month=${BASH_REMATCH[3]}
    year=${BASH_REMATCH[4]}
    ext=${BASH_REMATCH[5]}
    new_date=$(date -d "${day} ${month} ${year}" +%Y-%m-%d)
    newname="${new_date} - ${basename}${ext}"
    newname=${newname//[[:space:]]/_}
    echo "Old name: $oldname" # these lines will eventually be replaced with this line: mv "$oldname" "$newname"
    echo "New name: $newname"
  fi
done 

I added set -x to the beginning of my script, as well as : oldname="$oldname" after do, and then ran it using the command ./pub_renamer.sh /mnt/user/Pubs/Unsorted/The.New.Town.Cryer.-.No.1,032.[01.Oct.2020].

The result is this:

+ find '/mnt/user/Pubs/Unsorted/The.New.Town.Cryer.-.No.1,032.[01.Oct.2020]' -name '*.pdf' -type f -printf '%f\0'
+ IFS=
+ read -r -d '' oldname
+ : 'oldname=The New Town Cryer - No. 01,032 [01 Oct 2020].pdf'
+ date_re='(^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*)'
+ [[ The New Town Cryer - No. 01,032 [01 Oct 2020].pdf =~ (^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*) ]]
+ IFS=
+ read -r -d '' oldname

I also tried changing the %f in the find command to %p and running the script with the same directory as above and this is the output:

+ find '/mnt/user/Pubs/Unsorted/The.New.Town.Cryer.-.No.1,032.[01.Oct.2020]' -name '*.pdf' -type f -printf '%p\0'
+ IFS=
+ read -r -d '' oldname
+ : 'oldname=/mnt/user/Pubs/Unsorted/The.New.Town.Cryer.-.No.1,032.[01.Oct.2020]/The New Town Cryer - No. 01,032 [01 Oct 2020].pdf'
+ date_re='(^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*)'
+ [[ /mnt/user/Pubs/Unsorted/The.New.Town.Cryer.-.No.1,032.[01.Oct.2020]/The New Town Cryer - No. 01,032 [01 Oct 2020].pdf =~ (^.*) - ([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})(.*) ]]
+ IFS=
+ read -r -d '' oldname

It seems that the newname isn't being matched from the oldname, but I'm not sure why.

Edit: This version of the script appears to be working now:

#!/bin/bash

find "$1" -name "*.pdf" -type f -printf '%f\0' | while IFS= read -r -d '' oldname; do #find all pdfs
  : oldname="$oldname"
  date_re='(^.*) - [^[]+\[([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})\](.*)'
  if [[ $oldname =~ $date_re ]]; then
    basename=${BASH_REMATCH[1]}
    day=${BASH_REMATCH[2]}
    month=${BASH_REMATCH[3]}
    year=${BASH_REMATCH[4]}
    ext=${BASH_REMATCH[5]}
    new_date=$(date -d "${day} ${month} ${year}" +%Y-%m-%d)
    newname="${new_date} - ${basename}${ext}"
    newname=${newname//[[:space:]]/_}
    echo "Old name: $oldname"
    echo "New name: $newname"
  fi
done
xthursdayx
  • 13
  • 5
  • 1
    *I figured out that the problem is that the statement after do only works if the pdf filename is in quotes* -- that... doesn't really make sense. `find` is writing literal strings; they're not _expected_ to contain syntactic quotes. Could you show your work, in terms of what led you to believe that lack of quotes was a problem? – Charles Duffy Oct 16 '20 at 23:55
  • 1
    BTW, one thing that I often find useful in this kind of `while read` loop is to immediately after your `do` (on the same line is fine), put `: oldname="$oldname"` -- that way when you run your script with `bash -x yourscript`, it'll print out a line like `+ : 'oldname= The New Town Cryer - No. 1,032 [01 Oct 2020].pdf'`, showing the variable's _actual_ value in practice (with syntactic quoting added; those quotes aren't part of the value, but they show you what shell syntax you'd need to use to get the same value as a literal). – Charles Duffy Oct 16 '20 at 23:57
  • 1
    ...and if you _do_ run the script with `set -x`, please [edit] the output of that into the question. (Doesn't need to go beyond a single time through the loop / a single file found; that should be enough). – Charles Duffy Oct 16 '20 at 23:58
  • 1
    BTW, you probably want to change `%f` to `%p`, if you expect a `mv` command to be able to use your paths in cases where the files exist in subdirectories. – Charles Duffy Oct 17 '20 at 00:02
  • I just updated the question to explain why I thought that a lack of `' '` might be the problem. – xthursdayx Oct 17 '20 at 00:09
  • 1
    When you say "run the script with the variable `The New Town Cryer - No. 1,032 [01 Oct 2020].pdf`" (asserting that there are no quotes), how exactly are you doing that? Do you run `yourscript The New Town Cryer - No. 1,032 [01 Oct 2020].pdf`? Because that doesn't assign `The New Town Cryer - No. 1,032 [01 Oct 2020].pdf` to `$1` -- what it assigns to `$1` is the word `The`. In the command `yourscript 'The New Town Cryer - No. 1,032 [01 Oct 2020].pdf'`, quotes don't go _into_ `$1`; they tell the shell what other parts of your command line go into `$1`, as opposed to going into `$2`, `$3`, etc. – Charles Duffy Oct 17 '20 at 00:30
  • 1
    BTW, please consider my request for `set -x` trace output to be included in the question reiterated. (`+x` turns tracing _off_, it's the opposite of what I'm asking for). – Charles Duffy Oct 17 '20 at 00:32
  • 1
    BTW, `find: ‘’: No such file or directory` from the command `find "$1"` is just telling you that `$1` is empty. Consider `find "${1:-.}"` to make `.` a default when no starting directory is explicitly passed as an argument. – Charles Duffy Oct 17 '20 at 00:38
  • Sorry, I misunderstood what you mean by `set -x`. I've added it to my script now and have updated my original post with the results. – xthursdayx Oct 17 '20 at 06:03
  • Thank you. Could you add `: oldname="$oldname"` after the `do`, as previously requested, and update the `set -x` output accordingly? – Charles Duffy Oct 17 '20 at 13:04
  • By the way, I'm surprised that you're getting results with `.`s in place of spaces in your command line -- while `.` is a single-character wildcard in regex, the equivalent in glob syntax is `?`. Unless you have a directory with periods in its name, and that directory contains names with spaces? This is something where switching to `%p` or `%P` (to include the directory name in the output of `find` -- which will be needed for `mv` to work anyhow) would make things more clear. – Charles Duffy Oct 17 '20 at 13:11
  • `printf 'oldname=%q\n' "$oldname" >&2` would serve the same purpose (as getting `set -x` logging of the `: oldname="$oldname"` statement), of getting the precise value of the oldname variable in a form that shows hidden characters and the like. – Charles Duffy Oct 17 '20 at 13:54
  • When I was testing my (incorrect) answer below, I had copied the `date_re='(^.*) - [^[]+\[([[:digit:]]{2}) ([[:alpha:]]+) ([[:digit:]]{4})\](.*)'` that was at one point part of the OP. That worked for me, and indeed it does also work without the extra quotes, so I don't see what's wrong with the original script (above) when using that pattern. The pattern above (at this time) is missing the ` [^[]+`. – B. Morris Oct 17 '20 at 14:23
  • I have updated the script with your suggestions @Charles Duffy and pasted the output above. – xthursdayx Oct 17 '20 at 15:55
  • @B.Morris, ...maybe modify your answer to suggest that fix to the regex, then (or delete it and post a new one that does so)? – Charles Duffy Oct 17 '20 at 16:06
  • @xthursdayx, ...B.Morris has a good point. See the regex I provided in https://ideone.com/nl68la -- it's explicitly matching the square brackets. The one you use in the question here doesn't do that. – Charles Duffy Oct 17 '20 at 16:07
  • ...and see https://ideone.com/VpxLtF, showing that the regex you use in your question here doesn't match, while the one I gave you earlier (not in the answer to your original question -- which didn't have square brackets in the input -- but in the ideone link in a follow-on comment after you'd described your _real_ filenames' format) does. – Charles Duffy Oct 17 '20 at 16:10
  • I must have copied something or gotten confused about the different ideone versions. It seems to be working now. I will update the original question. Do you want to give an answer that I can approve here? – xthursdayx Oct 17 '20 at 16:58
  • Two other quick questions, 1) will this work to move and rename the pdfs if I sub in the `%p` instead of `%f` in the find command? And 2) will this work if it matches multiple pdfs at the same time? They will each be in different subfolders under the `Unsorted` directory with directory names as above: `Publication.Name.-.No.#.[Day.Month.Year]`? – xthursdayx Oct 17 '20 at 17:01
  • @CharlesDuffy it wasn’t clear what the protocol was for submitting a different answer vs deleting it. – B. Morris Oct 17 '20 at 17:51
  • @xthursdayx, given what we've discovered during troubleshooting, I'm not sure it's a valid question, so I don't presently intend to add an answer. I'd call this one "caused by a typo or other issue unlikely to help others", or might possibly choose to close as a duplicate of [Reference: What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean). Correspondingly, I would generally suggest deleting it at this point. – Charles Duffy Oct 17 '20 at 17:56
  • ...A good question has answers that are helpful to other people, but nobody but you has filenames in your exact input format; the question would need to be significantly reworked to raise an issue not specific to that format. – Charles Duffy Oct 17 '20 at 17:57
  • Got ya, okay. Well, thanks very much for your help! – xthursdayx Oct 17 '20 at 19:36
  • @B.Morris, ...either a rewrite-scale edit or delete-and-replace is acceptable, when the new version is going to be almost entirely different; normally, I'd go with the edit unless there's a comment thread that won't apply at all to the new one, and where you think having the old comments around would add confusion for readers. (That said, I'm around, active, and happy to delete my old comments if the answer is edited enough that they no longer make sense). – Charles Duffy Oct 17 '20 at 21:05

0 Answers0