3

I am working on a linux cluster. I have a list of files that i need to find.

Sample10
Sample22

These files have another serial number based naming convention. A tab separated file key.tsv contains both names listed on a single row.

Sample10 Serial102
Sample22 Serial120

I need to find the file by one name and link the file to another directory using its other ("Serial") name. This is my attempt.

for i in "Sample10" "Sample22";
do
    if [[ `find /directory/ -name $i*.fastq`]]
    then
    R1=$(find /directory/ -name $i*.fastq);
    ln -s $R1 /output/directory/"$i".fastq;
else
    echo "File existence failed"
    fi
done

This works to find the file of interest from the list and link it but i am stumped as to how rename them based on the entries in the key.

learningbee
  • 333
  • 1
  • 5
  • 11
Paul
  • 656
  • 1
  • 8
  • 23
  • Typically you would use `mv` or `rename` to rename the files... – l'L'l Apr 28 '18 at 01:41
  • 1
    Yes i am aware of this. They question really how to execute this renaming based on a tab separated key. An english language example might be... for `Sample10` look in the `key.tsv` file and find your corresponding `SerialID` assign that serial ID to a variable and then use that variable in place of `"$i"` in line 6. – Paul Apr 28 '18 at 01:47

3 Answers3

2

You can achieve this with a single call to find, while using an associative array to keep the mapping info read from key.tsv file:

#!/bin/bash

# build the hash for file mapping
declare -A file_map
while read -r src dst; do
  file_map["$src.fastq"]=$dst  # assume that the map doesn't have the .fastq extension
done < key.tsv

# loop through files and rename them
while read -d '' -r path; do   # read the NUL separated output out find
  base=${path##*/}             # get the basename
  dest=${file_map["$base"]}    # look up the hash to get dest name
  [[ $dest ]] || continue      # skip if no mapping was found
  echo "Creating a link for file $path"
  ln -s "$path" "/dest/dir/$dest.fastq"  
done < <(find /path/to/source/files -type f -name '*.fastq' -print0)

I haven't tested this. Will be happy to fix any issues that you may find.


Related:

codeforester
  • 39,467
  • 16
  • 112
  • 140
1

There are many ways to do this. awk is one way:

given

source dest

destination=awk '/source/ {print $2}' key.tsv

Alternatively, use grep and cut in an analogous fashion

Leonard
  • 13,269
  • 9
  • 45
  • 72
-3

Im not here to answer your homework, so I'll give the general idea.

You'll need to loop through the whole tsv. I suggest using python, for example use what this answer provides:

How to iterate through all the rows in a tsv file?

For each line, you'll have to find the corresponding data (usually a line is an array, so corresponding value is LINE[1]) AND do the check of existence of that file. In the below, sample code to do that in bash (find the equivalent in python, maybe you can use some sort of exec command).

find -name "LINE[0]" -exec rename 's/^LINE[1]_//'
Joe P.
  • 340
  • 3
  • 14
  • "Im not here to answer your homework, so I'll give the general idea." - Then don't post an answer. – SteveK Apr 28 '18 at 01:59