6

I am trying to build a generic formatter for my MP3 file names (very important) with bash, and a large part of this is being able to move text around using regex variables. For example I am trying to remove the parentheses () from around the ft. Kevin Parker.

oldfilename="Mark Ronson - 02 Summer Breaking (ft. Kevin Parker).mp3"

newfilename=$(echo $oldfilename | sed -E "s/ft.\(*\)/ft.\1/g")

This causes the error:

sed: 1: "s/ft.\(*\)/gt.\1/g": \1 not defined in the RE

I have tried escaping and not escaping the (), and adding and removing the -E switch as recommended by .bash_profile sed: \1 not defined in the RE. Help?!

Community
  • 1
  • 1
mummybot
  • 2,668
  • 2
  • 28
  • 31

1 Answers1

13

If you use -E, then \( and \) are actual parentheses; to capture, you'd use just ( and ). Here, you want to remove parentheses, so you need to match a literal (, capture the content up to the next ) and match but not capture the close ), and replace the whole lot with just the capture:

newfilename=$(echo "$oldfilename" | sed -E "s/\((ft[^)]*)\)/\1/g")

Or, for amusement value, you can do it without -E:

newfilename=$(echo "$oldfilename" | sed -e "s/(\(ft[^)]*\))/\1/g")

(The -e is a cheat; it just identifies an expression that's part of the sed script. It does not mean 'opposite of -E' and you could have both -E and one or more -e …arg… argument pairs.)

Note that the file name should be in quotes unless you are deliberately ensuring that any leading or trailing blanks are removed, and any internal tabs or newlines are replaced with blanks, and any multiple blanks in the name are replaced with a single tab. If you do want the 'space normalization', then leaving the quotes out is better.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Omitting quotes will also cause the shell to perform wildcard expansion, meaning that asterisks, question marks, and square brackets may disappear or be replaced by something else, depending on what files they happen to match. – tripleee Jan 21 '15 at 16:46
  • @tripleee: Oh, yeah; that too. Quotes are definitely better; you can normalize the spacing in the name in the `sed` script if that's necessary. – Jonathan Leffler Jan 21 '15 at 16:49
  • Thanks for interpreting what I needed and thus filling in my incomplete regex: I had missed the () from my example. This works a treat. One thing that has me confuses me from your answer however is why the closing parenthese after the ^. That appears to close the variable in the middle of the square bracket, and my poor understanding of regex would have it closing after the *. – mummybot Jan 21 '15 at 16:52
  • 1
    The character class `[^)]*` is a sequence of zero or more 'not close parenthesis' characters, followed by a `)`. Yes, it is confusing; it also means there are unbalanced parentheses in the expression overall. You could probably use `[^()]*` (a sequence of zero or more non-parentheses) which balances up the parentheses. – Jonathan Leffler Jan 21 '15 at 16:55