store txt files separately for each subcategories

Question

I have several experiments. Each experiment has several replicate files. I want to place all these replicate files into one text file in the following way.

Lets say there are 3 experiments and each experiment has 2 replicate files.(Experiment and replicate number can be more than this)

/home/data/study1/EXP1_30/EXP1_replicate_1_30.txt
/home/data/study1/EXP1_30/EXP1_replicate_2_30.txt
/home/data/study1/EXP1_60/EXP1_replicate_1_60.txt
/home/data/study1/EXP1_60/EXP1_replicate_2_60.txt
/home/data/study1/EXP2_30/EXP2_replicate_1_30.txt
/home/data/study1/EXP2_30/EXP2_replicate_2_30.txt
/home/data/study1/EXP2_60/EXP2_replicate_1_60.txt
/home/data/study1/EXP2_60/EXP2_replicate_2_60.txt
/home/data/study1/EXP3_30/EXP3_replicate_1_30.txt
/home/data/study1/EXP3_30/EXP3_replicate_2_30.txt
/home/data/study1/EXP3_60/EXP3_replicate_1_60.txt
/home/data/study1/EXP3_60/EXP3_replicate_2_60.txt

output file1.txt will look like

/home/data/study1/EXP1/EXP1_replicate_1_30.txt,/home/data/study1/EXP1/EXP1_replicate_2_30.txt \
/home/data/study1/EXP2/EXP2_replicate_1_30.txt,/home/data/study1/EXP2/EXP2_replicate_2_30.txt \
/home/data/study1/EXP3/EXP3_replicate_1_30.txt,/home/data/study1/EXP3/EXP3_replicate_2_30.txt

output file2.txt will look like

/home/data/study1/EXP1/EXP1_replicate_1_60.txt,/home/data/study/EXP1/EXP1_replicate_2_60.txt \
/home/data/study1/EXP2/EXP2_replicate_1_60.txt,/home/data/study1/EXP2/EXP2_replicate_2_60.txt \
/home/data/study1/EXP3/EXP3_replicate_1_60.txt,/home/data/study1/EXP3/EXP3_replicate_2_60.txt

....

My code with for loops:

ID=(30 60)
exp=("EXP1" "EXP2" "EXP3")

d=""
for  txtfile in /home/data/study1/${exp[0]}/${exp[0]}*_${ID[0]}.txt
do
    printf "%s%s" "$d" "$txtfile" 
    d=","
done
printf " \\" 
printf "\n" 

d=""
for txtfile in /home/data/study1/${exp[1]}/${exp[1]}*_${ID[0]}.txt
do

    printf "%s%s" "$d" "$txtfile" 
    d=","
done
printf " \\" 
printf "\n" 

d=""
for txtfile in /home/data/study1/${exp[2]}/${exp[2]}*_${ID[0]}.txt
do

    printf "%s%s" "$d" "$txtfile" 
    d=","
done

I am using for loops with index numbers for each experiment and replicates which is very time consuming. Is there any easy way?

If you want that output, why did your experiments output those files in the first place? — hek2mgl, Aug 07 '14 at 12:50
@hek2mgl those output files are coming from another pipeline and I have to process all the files together based on their IDs in this particular format — hash, Aug 07 '14 at 12:56
I will never understand why scientific programs produce output which can't be used by the scientist, unless being post-processed. — hek2mgl, Aug 07 '14 at 13:00
Can't you change (or somebody other) change the process' output in order to produce files which can be read easily be many different applications? Can't you store results in a database? At least the latter should be true, having the information from the question. — hek2mgl, Aug 07 '14 at 13:09
@hek2mgl No, its a well known pipeline which is used by many other scientists but the research study which I am dealing with requires me to process the data differently thats why I cannot change the way the pipeline is implemented. — hash, Aug 07 '14 at 13:17
Could you post what you're using at the moment? It sounds like it might be a perfectly valid approach. — Tom Fenech, Aug 07 '14 at 13:47

Tom Fenech · Accepted Answer · 2014-08-07T21:22:19.627

I think that this does what you want:

#!/bin/bash

ids=( 30 60 )
dir=/home/data/study1

# join glob on comma, add slash at end
# modified from http://stackoverflow.com/a/3436177/2088135
join() { local IFS=,; echo "$* "'\'; } #' <- to fix syntax highlighting

i=0
for id in "${ids[@]}"; do
    s=$(for exp in "$dir"/EXP*"$id"; do join "$exp/"*"$id".txt; done)
    # trim off final slash and output to file
    echo "${s%?}" > file$((++i)).txt
done

Output (note that when testing, I set dir=.):

$ cat file1.txt 
./EXP1_30/EXP1_replicate_1_30.txt,./EXP1_30/EXP1_replicate_2_30.txt \
./EXP2_30/EXP2_replicate_1_30.txt,./EXP2_30/EXP2_replicate_2_30.txt \
./EXP3_30/EXP3_replicate_1_30.txt,./EXP3_30/EXP3_replicate_2_30.txt 
$ cat file2.txt 
./EXP1_60/EXP1_replicate_1_60.txt,./EXP1_60/EXP1_replicate_2_60.txt \
./EXP2_60/EXP2_replicate_1_60.txt,./EXP2_60/EXP2_replicate_2_60.txt \
./EXP3_60/EXP3_replicate_1_60.txt,./EXP3_60/EXP3_replicate_2_60.txt

hek2mgl · Answer 2 · 2014-08-07T14:19:14.380

0

You can use the following bash script:

#!/bin/bash 

i=0; n=0; files=""
sort -t_ -k5 files.txt | while read line ; do
    files="$files $line"
    i=$((i+1))
    if [ $((i%6)) -eq 0 ] ; then
        n=$((n+1))
        cat $files > "$n.txt"
        files=""
    fi
done

edited Aug 07 '14 at 14:19

answered Aug 07 '14 at 13:21

hek2mgl

152,036
28
249
266

I guess you have opted for the more portable approach but just in case you weren't already aware (and potentially for the benefit of the OP), bash permits you to use `files+="$line"` `(( i += 1))` or `((++i))`, `(( i % 6 == 0 ))`, etc. – Tom Fenech Aug 07 '14 at 14:09
Yes, I had portability in mind when giving the answer – hek2mgl Aug 07 '14 at 14:18

David C. Rankin · Answer 3 · 2014-08-07T22:14:04.680

You can also make use of a subshell and do it from the command line (your data in dat/experiment.txt) with:

$ ( first=0; cnt=0; grep 30 dat/experiment.txt | sort | while read line; do \
[ "$first" = 0 ] && first=1 || { [ "$cnt" = 0 ] && echo ' \'; }; echo -n $line; \
((cnt++)); [ "$cnt" = 1 ] && echo -n ","; [ "$cnt" = 2 ] && cnt=0; done; \
echo "" ) >outfile1.txt

$ ( first=0; cnt=0; grep 60 dat/experiment.txt | sort | while read line; do \
[ "$first" = 0 ] && first=1 || { [ "$cnt" = 0 ] && echo ' \'; }; echo -n $line; \
((cnt++)); [ "$cnt" = 1 ] && echo -n ","; [ "$cnt" = 2 ] && cnt=0; done; \
echo "" ) >outfile2.txt

Admittedly, the one liner ended up being longer than originally anticipated to match your line continuations -- exactly. If you omit the line continuations in the outfiles, the line reduces to (e.g.):

$ (cnt=0; grep 30 dat/experiment.txt | sort | while read line; do echo -n $line; \
((cnt++)); [ "$cnt" = 1 ] && echo -n ","; [ "$cnt" = 2 ] && echo "" && cnt=0; \ 
done ) >outfile1.txt

output:

$ cat outfile1.txt
/home/data/study1/EXP1_30/EXP1_replicate_1_30.txt,/home/data/study1/EXP1_30/EXP1_replicate_2_30.txt \
/home/data/study1/EXP2_30/EXP2_replicate_1_30.txt,/home/data/study1/EXP2_30/EXP2_replicate_2_30.txt \
/home/data/study1/EXP3_30/EXP3_replicate_1_30.txt,/home/data/study1/EXP3_30/EXP3_replicate_2_30.txt \

$ cat outfile2.txt
/home/data/study1/EXP1_60/EXP1_replicate_1_60.txt,/home/data/study1/EXP1_60/EXP1_replicate_2_60.txt \
/home/data/study1/EXP2_60/EXP2_replicate_1_60.txt,/home/data/study1/EXP2_60/EXP2_replicate_2_60.txt \
/home/data/study1/EXP3_60/EXP3_replicate_1_60.txt,/home/data/study1/EXP3_60/EXP3_replicate_2_60.txt \

store txt files separately for each subcategories

3 Answers3