0

My bash script is the following:

#!/bin/bash
if [ ! -f "$1" ]; then
  exit
fi
while read line;do
  str1="[GAC]*T"
  num=$"(echo $line | tr -d -c 'T' | wc -m)"
  for((i=0;i<$num;i++))do
    echo $line | sed "s/$str1/&\n/" | head -n1 -q
    str1="${str1}[GAC]*T"
  done
  str1="[GAC]*T"
done < "$1

While it works normally as it should (take the filename input and print it line by line until the letter T and next letter T and so on) it prints to the terminal.

Input:

GATTT
ATCGT

Output:

GAT
GATT
GATTT
AT
ATCGT

When I'm using the script with | tee outputfile the outputfile is correct but when using the script with > outputfile the terminal hangs / is stuck and does not finish. Moreover it works with bash -x scriptname inputfile > outputfile but is stuck with bash scriptname inputfile > outputfile.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • If you interrupt this after it "should be done" does the outputfile contain the expected? – telina Apr 17 '18 at 13:25
  • Try with ./scriptname inputfile > outputfile – Abhijit Pritam Dutta Apr 17 '18 at 13:32
  • @AbhijitPritam Why would you expect that to help? – 123 Apr 17 '18 at 13:35
  • 2
    @Abir Is the quote missing at the end a typo? – 123 Apr 17 '18 at 13:35
  • Something like `awk -v RS=T '{print}' "$1"` would appear to do the same thing much more efficiently. Can you provide a sample input with its expected output? – chepner Apr 17 '18 at 14:11
  • Checking for the existence of the file as you are doing is introducing more problems that it's solving. If the file does not exist, your program exits silently, but successfully. If you simply omit those 3 lines of code, you get much more reasonable behavior (script fails with an error message of the form "No such file or directory") – William Pursell Apr 17 '18 at 14:40
  • yes qute missing at the end a typo – Abir Shaked Apr 17 '18 at 16:58
  • @telina , yes if i will close the terminal , reopen and check to outputfile , it contains what it should be done – Abir Shaked Apr 17 '18 at 16:59
  • @chepner for example:input: GATTT ATCGT output: GAT GATT GATTT AT ATCGT . should be each string in new line , but it keeps re-editing when i comment – Abir Shaked Apr 17 '18 at 17:08
  • Works without problem on bash 4.3 in Ubuntu 14.04. If only you take out the pair of double quotes in the num= line (and add a double quote after $1 at the end of course). num=$(some command) should work. – Gerrit Apr 17 '18 at 20:38

2 Answers2

1

I made modifications to your original script, please try:

if [ ! -f "$1" ]; then                                               
  exit                                                                   
fi                                                                       
while IFS='' read -r line || [[ -n "$line" ]];do                         
  str1="[GAC]*T"                                                         
  num=$(echo $line | tr -d -c 'T' | wc -m)                               
  for((i=0;i<$num;i++));do                                               
    echo $line | sed "s/$str1/&\n/" | head -n1 -q                        
    str1="${str1}[GAC]*T"                                                
  done                                                                   
  str1="[GAC]*T"                                                         
done < "$1" 

For input:

GATTT                                                                    
ATCGT

This script outputs:

GAT
GATT
GATTT
AT
ATCGT

Modifications made to your original script were:

  • Line while read line; do changed to while IFS='' read -r line || [[ -n "$line" ]]; do. Why I did this is explained here: Read a file line by line assigning the value to a variable

  • Line num=$"(echo $line | tr -d -c 'T' | wc -m)" changed to num=$(echo $line | tr -d -c 'T' | wc -m)

  • Line for((i=0;i<$num;i++))do changed to for((i=0;i<$num;i++));do

  • Line done < "$1 changed to done < "$1"

Now you can do: ./scriptname inputfile > outputfile

builder-7000
  • 7,131
  • 3
  • 19
  • 43
  • Just realized that `for((i=0;i<$num;i++))do` is valid syntax but still I prefer to put semicolon before *do*: `for((i=0;i<$num;i++));do`. – builder-7000 Apr 17 '18 at 20:26
0

Try:

sed -r 's/([^T]*T+)/\1\n/g' gatc.txt > outputfile

instead of your script.

It takes some optional non-Ts, followed by at least one T and inserts a newline after the T.

cat gatc.txt 
GATGATTGATTTATATCGT
sed -r 's/([^T]*T+)/\1\n/g' gatc.txt 
GAT
GATT
GATTT
AT
AT
CGT

For multiple lines, to delete empty lines in the end:

echo "GATTT
ATCGT" |  sed -r 's/([^T]*T+)/\1\n/g;' | sed '/^$/d'
GATTT
AT
CGT
user unknown
  • 35,537
  • 11
  • 75
  • 121