0

I have 20 files from which I want to grep all the lines that have inside a given id (id123), and save them in a new text file. So, in the end, I would have several txt files, as much as ids we have.

If you have a small number of Ids, you can create a script with the list inside. E.g:

list=("id123" "id124" "id125" "id126")

for i in "${list[@]}"
do
   zgrep -Hx $i *.vcf.gz > /home/Roy/$i.txt
done

This would give us 4 txt files (id123.txt...) etc.

However, this list is around 500 ids, so it's much easier to read the txt file that stores the ids and iterate through it.

I was trying to do something like:

list = `cat some_data.txt`

for i in "${list[@]}"
do
   zgrep -Hx $i *.vcf.gz > /home/Roy/$i.txt
done

However, this only provides the last id of the file.

RoyBatty
  • 306
  • 1
  • 7
  • If each id in the file is on a distinct line, you can do `while read i; do ...; done < panel_genes_cns.txt` – William Pursell Oct 06 '22 at 13:36
  • Thanks @WIlliamPursell I don't think it was clear. I need to have as much files as ids – RoyBatty Oct 06 '22 at 13:38
  • You were clear, and the solution I propose gives you one output file for each id. But this is really not the best way to approach the problem, since you're re-reading the files far more often than you need to. You could do this all with a single pass with awk. – William Pursell Oct 06 '22 at 13:43

2 Answers2

1

If each id in the file is on a distinct line, you can do

while read i; do ...; done < panel_genes_cns.txt

If that is not the case, you can simply massage the file to make it so:

tr -s '[[:space:]]' \\n < panel_genes_cns.txt | while read i; do ...; done

There are a few caveats to be aware of. In each, the commands inside the loop are also reading from the same input stream that while reads from, and this may consume ids unexpectedly. In the second, the pipeline will (depending on the shell) run in a subshell, and any variables defined in the loop will be out of scope after the loop ends. But for your simple case, either of these should work without worrying too much about these issues.

William Pursell
  • 204,365
  • 48
  • 270
  • 300
1

I did not check whole code, but from initally I can see you are using wrong redirection.

You have to use >> instead of >.

> is overwrites and >> is append.

list = `cat pannel_genes_cns.txt`

for i in "${list[@]}"
do
   zgrep -Hx $i *.vcf.gz >> /home/Roy/$i.txt
done
aze2201
  • 453
  • 5
  • 12