237

I am trying to write a bash script that contains a function so when given a .tar, .tar.bz2, .tar.gz etc. file it uses tar with the relevant switches to decompress the file.

I am using if elif then statements which test the filename to see what it ends with and I cannot get it to match using regex metacharacters.

To save constantly rewriting the script I am using 'test' at the command line, I thought the statement below should work, I have tried every combination of brackets, quotes and metacharaters possible and still it fails.

test sed-4.2.2.tar.bz2 = tar\.bz2$; echo $?
(this returns 1, false)

I'm sure the problem is a simple one and I've looked everywhere, yet I cannot fathom how to do it. Does someone know how I can do this?

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
user1587462
  • 2,525
  • 2
  • 15
  • 7

6 Answers6

368

To match regexes you need to use the =~ operator.

Try this:

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

Alternatively, you can use wildcards (instead of regexes) with the == operator:

[[ sed-4.2.2.tar.bz2 == *tar.bz2 ]] && echo matched

If portability is not a concern, I recommend using [[ instead of [ or test as it is safer and more powerful. See What is the difference between test, [ and [[ ? for details.

dogbane
  • 266,786
  • 75
  • 396
  • 414
  • 8
    Be careful with the glob wildcard matching in the second example. Inside [[ ]], the * is not expanded as it usually is, to match filenames in the current directory that match a pattern.Your example works, but it's really easy to over-generalize and mistakenly believe that * means to match anything in any context. It only works like that inside [[ ]]. Otherwise, it expands to the existing filenames. – Alan Porter Feb 26 '14 at 16:02
  • 21
    I tried to use quotes on the regex and failed; [this answer](http://stackoverflow.com/a/218217/1422630) helped on making this work `check="^a.*c$";if [[ "abc" =~ $check ]];then echo match;fi` we need to store the regex on a var – Aquarius Power Jun 15 '14 at 20:27
  • 2
    Also to note that regexp (like in perl) must NOT be in parenthesis: `[[ sed-4.2.2.tar.bz2 == "*tar.bz2" ]]` wouldn't work. – pevik Feb 27 '15 at 08:29
  • 23
    FWIW, the syntax for negation (i.e. *does not match*) is `[[ ! foo =~ bar ]]`. – Skippy le Grand Gourou Jan 02 '17 at 15:56
  • 1
    dash doesn't support the `-n 1` parameter, neither does it put it automatically into a `$REPLY` variable. Watch Out! –  Feb 04 '17 at 16:32
  • If portability is a concern, then don't use the `=~` operator! – miken32 Jan 25 '18 at 00:37
  • The page you linked to mentiones _RegularExpression matching =~ [is] (not available) [in] old test [_ so I guess it's not an option in the instead of part. – James Brown Jan 25 '18 at 14:53
  • Why do quotes cause the regex to not match? I thought it was a best practice to quote any variable usage, like `"$foo"`, so `[[ "$foo" == "^release/" ]]` seems like it should work... – void.pointer Aug 02 '18 at 14:42
78

A Function To Do This

extract () {
  if [ -f $1 ] ; then
      case $1 in
          *.tar.bz2)   tar xvjf $1    ;;
          *.tar.gz)    tar xvzf $1    ;;
          *.bz2)       bunzip2 $1     ;;
          *.rar)       rar x $1       ;;
          *.gz)        gunzip $1      ;;
          *.tar)       tar xvf $1     ;;
          *.tbz2)      tar xvjf $1    ;;
          *.tgz)       tar xvzf $1    ;;
          *.zip)       unzip $1       ;;
          *.Z)         uncompress $1  ;;
          *.7z)        7z x $1        ;;
          *)           echo "don't know '$1'..." ;;
      esac
  else
      echo "'$1' is not a valid file!"
  fi
}

Other Note

In response to Aquarius Power in the comment above, We need to store the regex on a var

The variable BASH_REMATCH is set after you match the expression, and ${BASH_REMATCH[n]} will match the nth group wrapped in parentheses ie in the following ${BASH_REMATCH[1]} = "compressed" and ${BASH_REMATCH[2]} = ".gz"

if [[ "compressed.gz" =~ ^(.*)(\.[a-z]{1,5})$ ]]; 
then 
  echo ${BASH_REMATCH[2]} ; 
else 
  echo "Not proper format"; 
fi

(The regex above isn't meant to be a valid one for file naming and extensions, but it works for the example)

Paul Beckingham
  • 14,495
  • 5
  • 33
  • 67
duality
  • 1,165
  • 8
  • 12
  • also note that with BSD tar you can use "tar xf" for all formats and don't need separate commands or this function whatsoever. – Good Person May 11 '16 at 23:26
  • `a` on GNU tar or `p` on BSD tar to explicitly tell it to automatically infer compression type from extension. GNU tar will not do it automatically otherwise, and I'm guessing from @GoodPerson 's comment that BSD tar does do it by default. – Mark K Cowan Apr 07 '17 at 19:42
  • 7z can unpack .. AR, ARJ, CAB, CHM, CPIO, CramFS, DMG, EXT, FAT, GPT, HFS, IHEX, ISO, LZH, LZMA, MBR, MSI, NSIS, NTFS, QCOW2, RAR, RPM, SquashFS, UDF, UEFI, VDI, VHD, VMDK, WIM, XAR and Z. see https://www.7-zip.org/ – mosh Mar 17 '18 at 16:11
28

I don't have enough rep to comment here, so I'm submitting a new answer to improve on dogbane's answer. The dot . in the regexp

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

will actually match any character, not only the literal dot between 'tar.bz2', for example

[[ sed-4.2.2.tar4bz2 =~ tar.bz2$ ]] && echo matched
[[ sed-4.2.2.tar§bz2 =~ tar.bz2$ ]] && echo matched

or anything that doesn't require escaping with '\'. The strict syntax should then be

[[ sed-4.2.2.tar.bz2 =~ tar\.bz2$ ]] && echo matched

or you can go even stricter and also include the previous dot in the regex:

[[ sed-4.2.2.tar.bz2 =~ \.tar\.bz2$ ]] && echo matched
user2066480
  • 1,229
  • 2
  • 12
  • 24
18

Since you are using bash, you don't need to create a child process for doing this. Here is one solution which performs it entirely within bash:

[[ $TEST =~ ^(.*):\ +(.*)$ ]] && TEST=${BASH_REMATCH[1]}:${BASH_REMATCH[2]}

Explanation: The groups before and after the sequence "colon and one or more spaces" are stored by the pattern match operator in the BASH_REMATCH array.

user1934428
  • 19,864
  • 7
  • 42
  • 87
3

shopt -s nocasematch

if [[ sed-4.2.2.$LINE =~ (yes|y)$ ]]
 then exit 0 
fi
Shyam Gupta
  • 489
  • 4
  • 8
3
if [[ $STR == *pattern* ]]
then
    echo "It is the string!"
else
    echo "It's not him!"
fi

Works for me! GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)

  • 1
    This is extremely dangerous; it only behaves without undefined behavior for you because you have no files in the current directory named the literal substring "pattern". Go ahead, create some files named like that, and substring expansion will match the files and break everything horribly with multicolored heisenbugs. – i336_ Sep 01 '18 at 08:24
  • 1
    But I have done an experiment: with files `1pattern, *pattern* pattern2 and pattern in the current directory. This script works as expected. Could you please provide me with your test result? @i336_ –  Mar 07 '19 at 08:39
  • 3
    @i336: I don't think so. Within `[[ ... ]]`, the rhs glob pattern does **not** expand according tho the current directory, as it would usually do. – user1934428 Mar 15 '19 at 08:45
  • 1
    @i336_ No. Within `[[...]]`, Bash doesn't perform filename expansion. In bash manual, `Word splitting and filename expansion are not performed on the words between the [[ and ]];` – rosshjb Jul 02 '20 at 07:58
  • @juancortez : It also does not really fulfil the requirments of the OP, who - for whatever reason - asked for matching a **regexp**. – user1934428 Apr 08 '21 at 13:34
  • Adding a reference to bash manual cited by @rosshjb : [3.2.5.2 Conditional Constructs](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Conditional-Constructs) – ジョージ Jul 06 '23 at 01:27