I'm trying to write a shell script that extracts a string that occurs between two other strings using a regex lookaround (though please let me know if there's a better way).
The string I'm searching through is the path /gdrive/My Drive/Github/gbks/NC_004113.1.gbk
(in reality I have several of these strings) and the part that I want to extract is the NC_004113.1
(or whatever is in its place in another similar string). In other words, the part that I want to extract will always be flanked by /gdrive/My Drive/Github/gbks/
and .gbk
.
I'm playing around with how to do this, and I thought that a regex lookaround might work. To complicate things slightly, the string itself is stored in a variable. I started to try the following, just to see if it would run, which it did:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP "$input_directory"/.*
However, when I tried to do the same thing with a lookaround, the command failed:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory")'
As a sanity check, I tried to pass the string directly as the expression, but it only worked when I omitted the quotation marks like so:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?=/gdrive/My Drive/Github/gbks/)'
This line actually gave me the output that I wanted (though I need to modify it so I'm passing the string in as a variable):
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<=/gdrive/My Drive/Github/gbks/).*(?=.gbk)'
Ultimately, I think the code should look something like:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory").*(?=.gbk)'
Thanks in advance!
-Rob