13

I'm looping through an array, and I want to test if an element is found in another array.

In pseudo-code, what I'm trying to do is this:

foreach $term (@array1) {
    if ($term is found in @array2) { 
        #do something here
    }
}

I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...

I've tried grep:

if grep {/$term/} @array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in @array2

if (grep(/$term/, @array2)) { #do something }
# this test likewise succeeds for values NOT IN the array

I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.

I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...

... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)

...

I have an existing script which uses 'grep' somewhat succesfully:

$rc=grep(/$term/, @array);
if ($rc eq 0) { #something happens here }

but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.

The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:

  @array=`cat file`;

whereas in new script I put the array inside the script itself (coz it's small) ... like this:

  @array=("element1","element2","element3","element4");

How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(

########################################################################

addendum ... some clarifications or examples of my actual code:

########################################################################

The term I'm trying to match/find/grep is a word element, for example "word123".

This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.

The term I'm searching for is stored in a variable which is instantiated via split:

foreach $line(@array1) {
  chomp($line);  # habit

  # every line has multiple elements that I want to capture
  ($term1,$term2,$term3,$term4)=split(/\t/,$line);  

  # if a particular one of those terms is found in my other array 'array2'
  if (grep(/$term2/, @array2) { 
    # then I'm storing a different element from the line into a 3rd array which eventually will be outputted
    push(@known, $term1) unless $seen{$term1}++;
  }
}

see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.

...

I just tried one of the below suggestions:

if (grep $_ eq $term2, @array2) 

And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.

MuleHeadJoe
  • 159
  • 2
  • 2
  • 8
  • 1
    Can you provide a short script (on pastebin or equivalent) that recreates your problem? That would help us diagnose what's going on. – Dancrumb Jul 06 '12 at 15:31
  • 2
    [How can I tell whether a certain element is contained in a list or array?](http://perldoc.perl.org/perlfaq4.html#How-can-I-tell-whether-a-certain-element-is-contained-in-a-list-or-array?) – Eugene Yarmash Jul 06 '12 at 15:32
  • 1
    http://stackoverflow.com/questions/2860226/how-can-i-check-if-a-perl-array-contains-a-particular-value – matthias krull Jul 06 '12 at 15:32
  • Yup... using a hash is the right thing to do here, otherwise your making a solution that won't perform for large arrays (since you're scanning array2 for every element of array1 – Dancrumb Jul 06 '12 at 15:35
  • 1
    What is the value of `$term`? Provide examples of your search term and what you expect to match and not match. Are you seeking an exact match (`"foo"` matches only `"foo"`) or a partial match (`"foo"` matches `"food"`)? – mob Jul 06 '12 at 15:36
  • `@array =\`cat $file\`` is possibly considered a useless use of `cat`. Perl has a perfectly good (better) `open` command to use. – TLP Jul 06 '12 at 18:47

8 Answers8

9

This is in perlfaq. A quick way to do it is

my %seen;
$seen{$_}++ for @array1;
for my $item (@array2) {
    if ($seen{$item}) {
        # item is in array2, do something
    }
}

If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).

ETA:

With the changed question: If the task is to match single words in @array2 against whole lines in @array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.

Unless @array2 is very large, you might do something like this:

my $rx = join "|", @array2;
for my $line (@array1) {
    if ($line =~ /\b$rx\b/) {  # use word boundary to avoid partial matches
        # do something
    }
}

If @array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:

my $rx = join "|", map quotemeta, @array2;
# etc
TLP
  • 66,756
  • 10
  • 92
  • 149
  • 2
    The advantage of this is that it's O(N). The naive solution is O(N^2). choroba's and cdarke's are O(N^2). – ikegami Jul 06 '12 at 16:01
  • I don't think this example will work as is, and I don't understand it well enough to see how it could be modified to suit. Array1 is the contents of a file, with each element of the array being an entire line from the file -- a row of data comprised of multiple elements. I have to chop that up to get the individual elements that I need to test against array2 which is a simpler array comprised of a plain list of single words. I can't compare an entire line from array1 against the single words in array2, that won't work. – MuleHeadJoe Jul 06 '12 at 18:30
  • @user1505587 You should have mentioned that in your question, it's a fairly important piece of information. So I take it then that you also want to ignore letter case. I will add the fix. – TLP Jul 06 '12 at 18:33
  • If you just want to find a list of words in a file, and you want a quick and dirty solution, why are you not just using `grep` with the `-f` option? – TLP Jul 06 '12 at 18:46
  • @TLP ... case is unimportant. And I don't think the source of the datasets is too important either since all data is reduced to arrays and elements (scalar vars). I have the element in hand, it is $term. I want to find IF $term exists in Array2. I don't need to pull anything out of Array2, this is an existence check only. If $term exists in Array2, then I have to do some work on the line that $term came out of (i.e., the original element form Array1). That also is already done, and not in question. – MuleHeadJoe Jul 06 '12 at 18:54
  • I just wanted to do all my work in one script. Grepping against the file outside of the perl script doesn't put the file data into the perl script where I want to manipulate it. I spose I could reduce the initial file using for i in list do grep $i oldfile >>newfile done, then all data in newfile would be relevant and I could run my perl script against that file and skip the grep-in-perl issue entirely. But that's at least one more step than I wanted to execute. Two scripts instead of one. No, perl is sposed to do the work, not me. – MuleHeadJoe Jul 06 '12 at 18:58
  • Oh, and the question didn't change, just your understanding of it ;-) ... I'm still just trying to see if an element stored as a scalar var is present in an array :-) – MuleHeadJoe Jul 06 '12 at 19:01
  • @MuleHeadJoe Since I can't read your mind, I would not know that you are looking for a substring and not an exact match. The details determine the solution, and the fewer details you give, the less accurate the answers will be. A quick and dirty solution would be to run the `grep` in a capture, such as `qx()`. But I did offer you a perl solution that should work in my update. – TLP Jul 06 '12 at 21:45
  • Since you are just looking for existence, check List::Util and List::MoreUtils - one of them provides an `any` function that will give you the benefit of bailing out once it hits a match. – RickF Jul 06 '12 at 21:51
  • @TLP thank you for all the information and suggestions. I'm still learning stuff here even if not all the suggestions work perfectly for my situation, and I appreciate you taking the time to post solutions and discuss it with me :-) – MuleHeadJoe Jul 06 '12 at 21:57
6

You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:

#!/usr/bin/perl
use strict;
use warnings;

my @array1 = qw/a b c d e f g h/; 
my @array2 = qw/a c e g z/; 

print "a in \@array1\n" if 'a' ~~ @array1;
print "z in \@array1\n" if 'z' ~~ @array1;
print "z in \@array2\n" if 'z' ~~ @array2;

The example is very simple, but you can use an RE if you need to as well. I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • I tried this: if ($term1 ~~ @array2) { print "$term found in array2\n"; } – MuleHeadJoe Jul 06 '12 at 19:13
  • sorry, got distracted mid-comment and it timed out on me ... but I was trying to use that indicated 'smart match operator' but it did not work for me. I'm using perl 5.10.1 on Cygwin. No errors, just didn't provide expected results. – MuleHeadJoe Jul 06 '12 at 19:31
  • @MuleHeadJoe - RE -> regular expression. I prefer cdarke's smart match solution. Am unsure, though, why it didn't work for you. For example, using cdarke's arrays: `do{print "$_ found in \@array2\n" if $_ ~~ @array2} for @array1;` shows a, c, e, g in @array2 – Kenosis Jul 06 '12 at 20:57
  • @MuleheadJoe - RE: either Religous Education or Regular Expression. I guess you need to figure out which is appropriate for the context. – cdarke Jul 07 '12 at 15:01
  • @Kenosis: Your results for that test appear to be correct to me, what else would you expect? a,c,e,g are all in both `@array1` and `@array2` (z is in `@array2` but not in `@array1`). – cdarke Jul 07 '12 at 15:05
  • @cdarke: Indeed, and I wanted to share this with MuleHeadJoe, as I prefer your smart matching solution for his "test if an element is found in another array" case. – Kenosis Jul 08 '12 at 16:25
5

This should work.

#!/usr/bin/perl
use strict;
use warnings;

my @array1 = qw/a b c d e f g h/;
my @array2 = qw/a c e g z/;

for my $term (@array1) {
    if (grep $_ eq $term, @array2) {
        print "$term found.\n";
    }
}

Output:

a found.
c found.
e found.
g found.
choroba
  • 231,213
  • 25
  • 204
  • 289
  • The OP is doing regex matches, not exact matches. But it's not clear what type of match he/she really wants. – mob Jul 06 '12 at 15:45
  • Is there a reason your example uses "for" and not "foreach"? Is it personal preference or is there a technical reason? In my script I use "foreach $line(@array1)" ... my array1 is a text file that I'm reading in (@array1=\`cat myfile\`;), chopping up each line so I can reorder elements and eventually output everything as a csv file that I can open & manipulate in Excel. – MuleHeadJoe Jul 06 '12 at 15:59
  • 4
    `for` and `foreach` are synonymous - use whichever you find to be more expressive. – RickF Jul 06 '12 at 16:05
  • @RickF ... thanks, I thought they had different features. I rarely if ever use 'for' so wasn't sure. – MuleHeadJoe Jul 06 '12 at 19:07
  • I think foreach was an accident in the design of an otherwise good language. What else could explain it? Many other languages uses for so I think for is clearer than foreach. (I addition to being shorter) – Kjetil S. Jul 10 '17 at 18:12
2
#!/usr/bin/perl

@ar = ( '1','2','3','4','5','6','10' );
@arr = ( '1','2','3','4','5','6','7','8','9' ) ;

foreach $var ( @arr ){
    print "$var not found\n " if ( ! ( grep /$var/, @ar )) ;
}
pkm
  • 2,683
  • 2
  • 29
  • 44
1

Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!

print "$element found in the array\n" if ("@array" =~ m/$element/);
Swadhikar
  • 2,152
  • 1
  • 19
  • 32
0

Your 'actual code' shouldn't even compile:

if (grep(/$term2/, @array2) { 

should be:

if (grep (/$term2/, @array2)) { 

You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:

if (grep {/$term2/} @array2) { 

You may want to use strict; and use warnings; to catch issues like this.

Oesor
  • 6,632
  • 2
  • 29
  • 56
  • my bad, I'm not cutting/pasting code ... code is on a physically separate machine ... the real deal had matched parens, and in that world I always check for syntax errors by doing "perl -cw [scriptname]" ... – MuleHeadJoe Jul 06 '12 at 19:23
0

The example below might be helpful, it tries to see if any element in @array_sp is present in @my_array:

#! /usr/bin/perl -w

@my_array = qw(20001 20003);

@array_sp = qw(20001 20002 20004);
print "@array_sp\n";

foreach $case(@my_array){
    if("@array_sp" =~ m/$case/){
    print "My God!\n";
    }

}

use pattern matching can solve this. Hope it helps -QC

0
1. grep with eq , then 
    if (grep {$_ eq $term2} @array2) { 
    print "$term2 exists in the array";
    }

2. grep with regex , then 
    if (grep {/$term2/} @array2) {
    print "element with pattern $term2 exists in the array";
    }
TKV
  • 21
  • 3