0

I am trying to substring many strings with bash.However, despite the prefix is sorrectly deleted, the suffix is not.

One of the strings:

lcl|MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]_[exception=RNA_editing]_[protein_id=QHD46953.1]_[location=complement(71768..73444)]_[gbkey=CDS]

The desired output:

MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]

The code

 for row in $colonna2; do tmp=${row#*lcl|}
 colonna2_newname=${tmp%exception=*} echo $colonna2_newname; done

The output

MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]_[exception=RNA_editing]_[protein_id=QHD46953.1]_[location=complement(71768..73444)]_[gbkey=CDS]

Any guess why the suffix is not deleted? Has my syntax some error?

Thanks in advance

Claudio21
  • 45
  • 5
  • I can't reproduce this, perhaps also because we don't know what `colonna2` contains. If you are reading a file, try `sed 's/.*lcl|\(.*\)exception=/\1/'` – tripleee Sep 10 '21 at 10:33
  • If you genuinely don't have a newline or a semicolon before the `echo`, the line gets parsed before the variable is set. – tripleee Sep 10 '21 at 10:35

1 Answers1

1

You have the variable substitution mostly right; it seems the main problem with the code is that there is no line break or semicolon after you define the colonna2_newname variable.

You will also want to change the colonna2_newname variable's definition from ${tmp%exception=*} to ${tmp%_[exception=*}.

for row in $colonna2
do
  tmp="${row#*lcl|}"
  colonna2_newname="${tmp%exception=*}"
  echo "$colonna2_newname"
done

# output:
# MK087647.1_cds_QHD46953.1_7_[gene=rpl2]_[protein=ribosomal_protein_L2]

Now about the for loop: If any of the lines in your $colonna2 variable have whitespace in them, for will split the line into separate strings after each space. for loops are better suited for use with arrays and globbed filenames/pathnames. while read loops are better to use with lines of text:

while IFS=$'\n' read -r row
do
  tmp="${row#*lcl|}"
  colonna2_newname="${tmp%exception=*}"
  echo "$colonna2_newname"
done <<< $colonna2
some coder guy
  • 285
  • 3
  • 10
  • The `while read` is an improvement over [reading lines with `for`](http://mywiki.wooledge.org/DontReadLinesWithFor) which is inherently broken, but it is also rather suboptimal; see [`while read` loop extremely slow compared to `cat`, why?](https://stackoverflow.com/questions/13762625/bash-while-read-loop-extremely-slow-compared-to-cat-why) – tripleee Sep 10 '21 at 11:31