0

I am trying to copy files from one directory (defined as $inDir below) to another (defined as $outDir below) if they 1) exist and 2) have more than 1 line in the file (this is to avoid copying files that are empty text files). I am able to do the first part using the below code but am struggling to know how to do the latter part. I'm gussing maybe using awk and NR somehow but I'm not very good with coding in Bash so any help would be appreciated. I'd like this to be incorporated into the below if possible, so that it can be done in one step.

for i in $inDir/NVDI_500m_mean_distance_*_40PCs; do
    batch_name_dir=$i;
    batch_name=$(basename $i);
    if [ ! -f $outDir/${batch_name}.plink.gz ]; then
            echo 'Copying' $batch_name;
            find $batch_name_dir -name ${batch_name}.plink.gz -exec cp {} $outDir/${batch_name}.plink.gz \;
    else
            echo $batch_name 'already exists'
    fi
done
user5481267
  • 117
  • 1
  • 15
  • "...2) have more than 1 line in the file (this is to avoid copying files that are empty text files)..." You can use "-s" (the same way you use "-f" to check the existence of a file in the output dir) to check the "emptyness" of a file. – danrodlor May 13 '19 at 10:25
  • But if you really want to check the number of lines (you shouldn't), just use `wc -l ` – Aaron May 13 '19 at 10:31

2 Answers2

1

You can use wc -l to check how many lines are in a file and awk to strip only the number from the result.

lines=$(wc -l $YOUR_FILE_NAME | awk '{print $1}')

if [ $lines -gt 0 ]; then 
    //copy the file 
fi

Edit: I have corrected LINES to lines according to the comments below.

Nathan TM
  • 103
  • 6
  • Don't forget to properly assign the result of the pipe to the **LINES** variable: `LINES=$(wc -l $YOUR_FILE_NAME | awk '{print $1}')` – danrodlor May 13 '19 at 10:38
  • or eliminate the need for awk at all with `lines=$(wc -l < "$your_file_name")`. Note that ALL_CAP variable names are reserved by convention to system variables. Good luck. – shellter May 13 '19 at 12:31
  • This seemed to work: `for i in $inDir/Mean_age_*_15PCs; do batch_name_dir=$i; batch_name=$(basename $i); LINES=$(wc -l $batch_name_dir/*/${batch_name}.plink.gz | awk '{print $1}'); if [ $LINES -gt 0 ]; then find $batch_name_dir -name ${batch_name}.plink.gz -exec cp {} $outDir/${batch_name}.plink.gz \; fi done` – user5481267 May 13 '19 at 13:56
  • See [Correct Bash and shell script variable capitalization](https://stackoverflow.com/q/673055/4154375) for details of why it is best not to use all-uppercase variable names. `LINES` is a good example of a variable name that is likely to clash with an environment variable. See [Environment Variables (The Open Group Base Specifications Issue 7)](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html). – pjh May 13 '19 at 18:33
0

I propose this:

for f in "$(find $indir -type f -name 'NVDI_500m_mean_distance_*_40PC' -not -empty)"; 
do 
  cp "$f" /some/targetdir; 
done

find is faster than wc to check for zero size.

I consider it more readable, than the other solution, subjectivly.

However, the for-loop is not necessary, since:

find "$indir" -type f -name 'NVDI_500m_mean_distance_*_40PC' -not -empty |\
    xargs -I % cp % /some/targetdir/%

Always "quote" path strings, since most shell utils break when there are unescaped shell chars or white spaces in the string. There are rarely good reasons to use unquoted strings.

krysopath
  • 324
  • 4
  • 9
  • Thank you @krysopath. Sorry I forgot to say that in my code it uses a batch name directory and a batch name. So the structure of the folders are that the folders are named as indicated and then there is another folder below with a random name which contains the files (the .gz ones) that I want to copy. So the above doesn't work because it is looking just in $inDir I assume? Is there a way to incorporate the -not -empty into the searching for the files part in sub-folders? – user5481267 May 13 '19 at 12:20
  • I think the latter kind of works, in that it moves the files. However it moves all files, even those that have 0 lines. I'm not sure how -not -empty works, but the files do have a size, but just no text in them. – user5481267 May 13 '19 at 13:55
  • I also failed to read your title. nonempty != more than one line of text :) I believe the issue with subdirs could be easliy worked around. – krysopath May 15 '19 at 15:39