2

I have two files with two columns separated by space

cat file1.txt
281475225437349 33,32,21,17,20,22,18,30,19,16,23,31
281475550885480 35,32,33,21,39,40,57,36,41,17,20,38,34,37,16,99

cat file2.txt
281475550885480 16,17,20,21,32,33,34,35,36,37,38,39,40,41
281475225437349 16,17,18,19,20,21,22,23,24,25,30,31,32,33

i want to compare values from file1 column2 with file2 column2 for same value in column1. and print only those values which does exist in file1 column2 but not in file2 column2 along and not vice versa along with respective value in column1

Desired o/p

it should not print anything for 281475225437349 as all values from file1 column2 present in file2 column 2 for 281475225437349

it should only print values for 281475550885480 which present in file1 column2 but not present in file2 column2 . Like values 57 and 99 for 281475550885480

So o/p file like:

cat output.txt
281475550885480 57,99

I have tried sorting the file and the comparing using sdiff but it give difference and its taking time

sdiff file1.txt file2.txt
  • I've added some formatting to your post to make it easier to understand. But there seems to be some repetition in there. Can you please check it and remove any duplicate text. – Dave Cross Jan 25 '19 at 11:55
  • Welcome to SO @Jeevan Patil, your question is NOT clear. Please do mention 3 simple things, 1- sample of input, 2- sample of output and 3rd- whatever you have tried in order to fix it in CODE TAGS `{}` button and let us know then. – RavinderSingh13 Jan 25 '19 at 12:04
  • Hi . thanks for the suggestion. I edited it and removed duplicated statements – Jeevan Patil Jan 25 '19 at 12:14

6 Answers6

3

Perl solution: create a hash of hashes from the second file. The key is the large number, the inner keys are the smaller numbers from the comma separated list. Then iterate over the first file and check what numbers aren't mentioned in the remembered structure.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

open my $f1, '<', 'file1' or die $!;
open my $f2, '<', 'file2' or die $!;

my %seen;
while (<$f2>) {
    my ($key, $value_string) = split ' ';
    my @values = split /,/, $value_string;  #/
    undef @{ $seen{$key} }{@values};
}
while (<$f1>) {
    my ($key, $value_string) = split ' ';
    my @values = split /,/, $value_string;
    my %surplus;
    undef @surplus{@values};
    delete @surplus{ keys %{ $seen{$key} } };
    say $key, ' ', join ',', keys %surplus
        if keys %surplus;
}

BTW, when you switch the files, the output will be

281475225437349 24,25

because 24 and 25 aren't present in file1.

choroba
  • 231,213
  • 25
  • 204
  • 289
  • hi thanks for you input. But my perl version is old and doesnt support feature.pm. I cant up-grade perl. can you suggest some alternate to it ? – Jeevan Patil Jan 25 '19 at 16:39
  • @JeevanPatil: Just use [print](http://p3rl.org/print) instead of [say](http://p3rl.org/say) and append a `"\n"`. – choroba Jan 25 '19 at 16:45
  • Great.. print is working . thanks . But at what line i should append "\n" ? – Jeevan Patil Jan 25 '19 at 20:21
  • 1
    `print $key, ' ', (join ',', keys %surplus), "\n" if keys %surplus;` – choroba Jan 25 '19 at 20:37
  • it worked as expected . Thanks once again. I just came across the problem where in file2 column 2 has no value for some object in column 1.. Like below...... – Jeevan Patil Jan 25 '19 at 20:57
  • cat file1.txt 281475225437349 33,32,21,17,20,22,18,30,19,16,23,31 281475550885480 35,32,33,21,39,40,57,36,41,17,20,38,34,37,16,99 cat file2.txt 281475550885480 16,17,20,21,32,33,34,35,36,37,38,39,40,41 281475225437349 So o/p should be 281475225437349 33,32,21,17,20,22,18,30,19,16,23,31 281475550885480 57,99 But it gives error for this condition – Jeevan Patil Jan 25 '19 at 21:00
  • its not working with ~1k values in column2. I have ~2k values from 16 till 2016 in column2 for some objects and it doesnt work there. – Jeevan Patil Jan 25 '19 at 21:17
  • 1
    Oh, I see, to fix the error, add the `if` condition to `undef @{ $seen{$key} }{@values} if @values;`. – choroba Jan 25 '19 at 21:26
  • Thanks .. It is working exactly. I would test this script with more scenarios. :) – Jeevan Patil Jan 28 '19 at 07:32
1

You must use two loop in your code. Also split ',' character and set two array, after

foreach(var itemOne as arrayOne){
boolean isExist=false;
foreach(var itemTwo as arrayTwo)
if(itemOne==itemTwo) isExist=true;

if(isExist) console.log(itemOne+" is exist");
}
1

another awk

$ awk -v c=, 'NR==FNR {a[$1]=$2; next}
                      {for(i=2;i<=NF;i++) 
                         {if(c a[$1] c !~ c $i c) p=(p==""?$1" ":p c) $i} 
                          if(p) print p; p=""}' file2 FS=' |,' file1

281475550885480 57,99
karakfa
  • 66,216
  • 7
  • 41
  • 56
0

This is straightforward with awk:

awk '(NR==FNR) { a[$1]=","$2","; next }
     { delete b }                   # clear array for new run
     { split($2,f,",") }            # split string of file1 in array f
     { for(i in f) if (! match(a[$1],"," f[i] ",")) b[f[i]]  }                                                                                                                                                                             
     { c=$1" "; for(i in b) {printf c i; c="," }; if (c==",") printf "\n" }' file2 file1

which returns:

281475550885480 57,99

The idea is to store the string of column two with an extra two <comma>'s to ensure that every number is between commas. Then search for the comma-sandwiched substring.

If you have to do a real number check and you might have to compare numbers "5" with "005" and they should be equal, then you should do the following:

awk '(NR==FNR) { a[$1]=$2; next }
     { delete b }                       # clear array
     { # split strings in number elements
       split($2,f,",");    for(i in f) f1[f[i]+0];
       split(a[$1],f,","); for(i in f) f2[f[i]+0]
     }
     { for(i in f1) if (! (i in f2)) b[i] }
     { c=$1" "; for(i in b) {printf c i; c="," }; if (c==",") printf "\n" }' file2 file1
kvantour
  • 25,269
  • 4
  • 47
  • 72
0

This might work for you (GNU sed):

sed -r 's#^(\S+)\s(\S+)$#/^\1 /s/$/,\\n\2,/#' file2 |
sed -rnf - -e ':a;s/(\b[0-9]+,)(.*\n.*)\1/\2/;ta;s/(.*),\n.*/\1/p' file1

The solution can be split into two parts.

First file2 is turned into a sed script which appends the values for each key to the same key in file1. Additional ,'s are added to make matching easier and will act as indicator of whether a line from file1 has values to be printed.

The script generated from file2 is piped into a second sed invocation and an additional subscript uses substitution and a loop to remove matching values from file1.

On a match failing, and because the second sed invocation uses the -n switch to make printing explicit, a final match removes the introduced , and non-matched values following the newline and prints the required result.

potong
  • 55,640
  • 6
  • 51
  • 83
0
$ cat tst.awk
BEGIN { FS="[ ,]" }
NR==FNR {
    for (i=2; i<=NF; i++) {
        file2[$1,$i]
    }
    next
}
{
    diff = ""
    for (i=2; i<=NF; i++) {
        if ( !(($1,$i) in file2) ) {
            diff = (diff == "" ? "" : diff ",") $i
        }
    }
    if (diff != "") {
        print $1, diff
    }
}

$ awk -f tst.awk file2 file1
281475550885480 57,99
Ed Morton
  • 188,023
  • 17
  • 78
  • 185