1

I have a string /home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A_1.fq.gz and I am using awk to split the string:

echo  /home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A.fasta.gz | awk -F'.[^.]*$' '{ print $1 }'

Which returns:

/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A.fasta

But want it to return:

/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A

How do I do this?

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
Lamma
  • 895
  • 1
  • 12
  • 26
  • 1
    Just use .fasta as the delimiter : `awk -F ".fasta" '{ print $1}'` – Dexirian Jan 16 '20 at 15:52
  • This would work but the file extension might not always be the same for me. – Lamma Jan 16 '20 at 15:55
  • Does this answer your question? [Extract filename and extension in Bash](https://stackoverflow.com/questions/965053/extract-filename-and-extension-in-bash) – Corentin Limier Jan 16 '20 at 16:14
  • 2
    What's the rule then to remove some parts of the string? – Nico Haase Jan 16 '20 at 16:37
  • @CorentinLimier This seems like it would work aslong as you didn't have "." in the filename at points other than for the file extension, which unfortunatly I somttimes do :( – Lamma Jan 17 '20 at 08:16

2 Answers2

3

Could you please try following. You could use Parameter expansion of bash.

val="/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A.fq.gz"
echo "${val%_*}"

Output will be as follows.

/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A


EDIT: As per anubhava sir's comments, in case variable has . itself then try following using rev + awk solution.

echo "$val" | rev | awk 'BEGIN{FS=OFS="."} {$1=$2="";sub(/^\.+/,"");print $0}' | rev


EDIT2: Adding a sed + rev solution.

echo "$val" | rev | sed 's/[^.]*.[^.]*.\(.*\)/\1/' | rev
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • This indeed works, is this just a better method than awk or? – Lamma Jan 16 '20 at 15:54
  • @Lamma, In this condition IMHO yes it is better command for shown sample. – RavinderSingh13 Jan 16 '20 at 15:55
  • What are the limitations of this method and what would I google to find out more? – Lamma Jan 16 '20 at 15:57
  • 1
    Hi @RavinderSingh13: If input is `val='/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A.fasta.gz'` then this will give `/home/lamma/local-blast/termitomycesBGI/short` which may not be what OP wants. – anubhava Jan 16 '20 at 15:58
  • 1
    @anubhava, Thanks sir, I added EDIT solution now sir, please do lemme know if EDIT solution looks Good now sir. – RavinderSingh13 Jan 16 '20 at 16:05
  • @Lamma, it is called Parameter expansion of bash, see this link https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html it has great information in it, I also added awk solution here, lemme know in case of any more queries. – RavinderSingh13 Jan 16 '20 at 16:11
1

Split a string by awk and print everything but the last two splits

You may use this awk:

awk 'BEGIN{FS=OFS="."} {$NF=$(NF-1)=""; NF-=2} 1' <<< '/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A.fasta.gz'

/home/lamma/local-blast/termitomycesBGI/short_reads/F19FTSEUHT1394.IC0035-2A
anubhava
  • 761,203
  • 64
  • 569
  • 643