1

I recently make a request against the Google Cloud Service API endpoint and wget a lot of files into one single folder. Owing to the fact that all sub-directories separator 0/ are being replaced by %2F with the addition of ?alt=media, all the downloaded files are contaminated with these strings. e.g.

hg38%2Fv0%2FHomo_sapiens_assembly38.dict?alt=media
hg19%2Fv0%2FHomo_sapiens_assembly19.fasta.alt?alt=media

I tried to test the following in bash and it returned the result i wanted:

echo "$hg19%2Fv0%2FHomo_sapiens_assembly19.fasta.alt?alt=media" | sed -e "s/^$hg19%2Fv0%2F//" -e "s/\?.*//g"

i.e. Homo_sapiens_assembly19.fasta.alt. Unfortunately when I scaled it up using,

for file in *; do 
    mv "$file" '$(echo "$file" | sed -e "s/^$hg19%2Fv0%2F//" -e "s/\?.*//g")' ; 
done

all the files turned into 1 file named "$file". I couldnt figure out why.

Please can anyone provide a solution to my problem? And if some of the files contain different repeats of "%2F", how can I elegantly only keep the string after the last "%2F" and string the "?alt=media" from the end in the same line?

Thank you in advance.

Barmar
  • 741,623
  • 53
  • 500
  • 612

2 Answers2

1

actually to removing all occurrences of %2F except for the last one, you can do like this:

echo "hg38%2Fv0%2FHomo_sapiens_assembly38.dict?alt=media" | sed -e "s/.*%2F\([^%]*\)\?alt.*/\1/"
  • ".*%2F" matches any characters followed by the last occurrence of "%2F".
  • "([^%]*)" captures any characters that are not "%".
  • "?alt.*" matches the string "?alt" followed by any characters.

result is :

Homo_sapiens_assembly38.dict 

and about the for loop something like this :

for file in *
 do mv "$file" "$(echo "$file" | sed -e "s/^$hg19%2Fv0%2F//" -e "s/\?.*//g")"
done
Freeman
  • 9,464
  • 7
  • 35
  • 58
1

Use .* to match everything up to the last %2F.

Put the command substitution inside double quotes, not single quotes. See Difference between single and double quotes in Bash

Don't put $ before hg at the beginning.

It's not a requirement, but sed commands are usually put in single quotes, unless you're using variables in the substitution.

for file in *; do 
    mv "$file" "$(echo "$file" | sed -e 's/^hg.*%2F//' -e 's/\?.*//g')" ; 
done
Barmar
  • 741,623
  • 53
  • 500
  • 612