Copy files containing all lines of an input file

Question

I want to copy files in a directory which contain all the lines of an inputFile. Here is an example:

inputFile

Line3
Line1
LineX
Line4
LineB

file1

Line1
Line2
LineX
LineB

file2

Line100
Line10
LineB
Line4
LineX
Line3
Line1
Line4
Line1

The script is expected to copy only file2 to a destination directory since all lines of the inputFile are found in file2 but not in file1.

I could compare individual file with inputFile as discussed partly here and copy files manually if script produced no output. That is;

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 inputFile
Line3
Line4
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 inputFile

warranting no need to copy file1; however, replacing file2 will produce no result indicating all lines of inputFile are found in file2; so do a cp file2 ../distDir/.

This will be time taking and hope there will be some way I could do it in a for loop. I am not particular about awk, any bash scripting tool can be used.

Thank you,

score 2 · Accepted Answer · answered Sep 13 '17 at 06:34

2

Assuming the following:

All the files you need to check are in the current directory
The base file is also in the current directory and named inputFile
The target path is ../distDir/

You may run a BASH script like the following which basically loops over all the files, compares them against the base file and copies them if required.

#!/bin/bash

inputFile="./inputFile"
targetDir="../distDir/"
for file in *; do
  dif=$(awk 'NR==FNR{a[$0];next}!($0 in a)' $file $inputFile)
  if [ "$dif" == "" ]; then
    # File contains all lines, copy
    cp $file $targetDir
  fi
done

answered Sep 13 '17 at 06:34

Cristian Ramon-Cortes

1,838
1
19
32

Thank you; and you think `"$dif" == " "` will be false due to a blank space difference between a file and inputFile? I might pre-process the file if such is the case? – deepseefan Sep 13 '17 at 07:31
The `if` statement I provided basically checks that the `awk` command has not produced ANY output. This means that for different behaviors you need to tune the `awk` command. In this case, the `awk` command you provided checks for each line of `inputFile` if it matches EXACTLY to some line in `file` so if there is a space difference it will be printed. I will suggest you to preprocess the files, remove the spaces found by the awk command (by adding `| somecmd`) or add more statements on the `if`. – Cristian Ramon-Cortes Sep 13 '17 at 14:28

RomanPerekhrest · Answer 2 · 2017-09-13T07:54:17.527

1

bash (with comm + wc commands) solution:

#!/bin/bash

n=$(wc -l inputFile | cut -d' ' -f1)   # number of lines of inputFile
for f in /yourdir/file*
do
    if [[ $n == $(comm -12 <(sort inputFile) <(sort "$f") | wc -l | cut -d' ' -f1) ]]
    then 
        cp "$f" "/dest/${f##*/}" 
    fi
done

comm -12 FILE1 FILE2 - output only lines that appear in both files

edited Sep 13 '17 at 07:54

answered Sep 13 '17 at 07:33

RomanPerekhrest

88,541
4
65
105

RavinderSingh13 · Answer 3 · 2017-09-13T08:31:59.650

Could you please try following and let me know if this helps you. I have written "echo cp " val " destination_path" in system, so you could remove echo from it and put destination_path's actual value too once you are happy with echo result(which will simply print eg--> cp file2 destination_path)

awk 'function check(array,val,count){
        if(length(array)==count){
           system("echo cp " val " destination_path")
}
}
FNR==NR{
  a[$0];
  next
}
val!=FILENAME{
  check(a,val,count)
}
FNR==1{
  val=FILENAME;
  count=total="";
  delete b
}
($1 in a) && !b[$1]++{
  count++
}
END{
  check(a,val,count)
}
' Input_file file1  file2

Will add explanation shortly too.

EDIT1: As per OP file named which should be compared by Input_file could be anything so changed code as per that request.

find -type f -exec awk 'function check(array,val,count){
        if(length(array)==count){
           system("echo cp " val " destination_path")
}
}
FNR==NR{
  a[$0];
  next
}
val!=FILENAME{
  check(a,val,count)
}
FNR==1{
  val=FILENAME;
  count=total="";
  delete b
}
($1 in a) && !b[$1]++{
  count++
}
END{
  check(a,val,count)
}
' Input_file {} +

Explanation: Adding explanation too as follows.

find -type f -iname "file*" -exec awk 'function check(array,val,count){ ##Using find command to get only the files in a directory, using exec passing their values to awk too.From here awk code starts, creating a function named check here, which will have parameters array,val and count to be passed into it, whenever a call is being made to it.
        if(length(array)==count){                    ##Checking here if length of array is equal to variable count, if yes then do following action.
           system("echo cp " val " destination_path")##Using awks system function here by which we could execute shell commands in awk script, so I have written here echo to only check purposes initially, it will print copy command if any files al lines are matching to Input_file file, if OP is happy with it OP should remove echo then.
}
}
FNR==NR{                                             ##FNR==NR condition will be only TRUE when very first file named Input_file is being read.
  a[$0];                                             ##creating an array named a whose index is current line.
  next                                               ##using next keyword will skip all further statements.
}
val!=FILENAME{                                       ##checking here when variable val is not having same value as current file name then perform following actions.
  check(a,val,count)                                 ##calling check function with passing arguments of array a,val,count.
}
FNR==1{                                              ##Checking if FNR==1, which will be true whenever a new files first line is being read.
  val=FILENAME;                                      ##creating variable named val whose value is current Input_file filename.
  count=total="";                                    ##Nullifying variables count and total now.
  delete b                                           ##Deleting array b here.
}
($1 in a) && !b[$1]++{                               ##Checking if first field of file is in array a and it is not present more than 1 time in array b then do following
  count++                                            ##incrementing variable named count value to 1 each time cursor comes inside here.
}
END{                                                 ##starting awk END block here.
  check(a,val,count)                                 ##Calling function named check with arguments array a,val and count in it.
}
' Input_file {} +                                    ##Mentioning Input_file here

PS: I tested/written this in GNU awk.

Thank you, and does this mean file names need to specified manually? — deepseefan, Sep 13 '17 at 07:41
So in case you have file names eg--> file1, file2, file3 then you could mention Input_file file*, kindly try it and let me know how it goes then, also we may need to use close too if you have many number of files, once you confirm that Input_file file* is working fine I will fine tune the code then, let me know. — RavinderSingh13, Sep 13 '17 at 07:55
Could you please check my EDIT1 solution now, I have tuned the code in that manner where it will look for all files only and pass it to awk, let me know how it goes then. — RavinderSingh13, Sep 13 '17 at 08:12
Thank you; but I need time to check your script. The above solutions were easier for me to understand and check right away. — deepseefan, Sep 13 '17 at 08:53
Not an issue, I tried my best to help, added explanation too to it. Let me know in case you have any queries, enjoy learning. — RavinderSingh13, Sep 13 '17 at 08:59

Copy files containing all lines of an input file

3 Answers3