0

I'm trying to write a shell script that extracts a string that occurs between two other strings using a regex lookaround (though please let me know if there's a better way). The string I'm searching through is the path /gdrive/My Drive/Github/gbks/NC_004113.1.gbk (in reality I have several of these strings) and the part that I want to extract is the NC_004113.1 (or whatever is in its place in another similar string). In other words, the part that I want to extract will always be flanked by /gdrive/My Drive/Github/gbks/ and .gbk.

I'm playing around with how to do this, and I thought that a regex lookaround might work. To complicate things slightly, the string itself is stored in a variable. I started to try the following, just to see if it would run, which it did:

input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP "$input_directory"/.*

However, when I tried to do the same thing with a lookaround, the command failed:

input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory")'

As a sanity check, I tried to pass the string directly as the expression, but it only worked when I omitted the quotation marks like so:

input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?=/gdrive/My Drive/Github/gbks/)'

This line actually gave me the output that I wanted (though I need to modify it so I'm passing the string in as a variable):

echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<=/gdrive/My Drive/Github/gbks/).*(?=.gbk)'

Ultimately, I think the code should look something like:

input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory").*(?=.gbk)'

Thanks in advance!

-Rob

rchurt
  • 1,395
  • 1
  • 10
  • 21
  • 1
    Doesn't have anything to do with regexps. This is a duplicate of [Difference between single and double quotes in Bash](https://stackoverflow.com/questions/6697753/difference-between-single-and-double-quotes-in-bash) – oguz ismail Jun 10 '20 at 04:32
  • Thanks @oguzismail, but I'm still not understanding—how exactly would you get the last two lines I posted to work? – rchurt Jun 10 '20 at 04:39

1 Answers1

1

In grep -oP '(?<="$input_directory")', the variable input_directory won't be expanded becaues of the outer single quotes. You can do something like `

grep -oP '(?<='"$input_directory"')'

instead.

user1934428
  • 19,864
  • 7
  • 42
  • 87