278

I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array.

I am reading a file for a parameter. I have a long list of parameters I do not want to deal with. I placed these unwanted parameters in an array @badparams.

I want to read a new parameter and if it does not exist in @badparams, process it. If it does exist in @badparams, go to the next read.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Mel
  • 3,855
  • 4
  • 24
  • 19
  • 4
    For the record, the answer does depend on your situation. It sounds like you want to make repeated lookups, so using a hash as jkramer suggests is good. If you only wanted to make one lookup, you might as well just iterate. (And in some cases you might want to binary search instead of using a hash!) – Cascabel May 18 '10 at 19:23
  • 5
    [perldoc -f grep](http://perldoc.perl.org/functions/grep.html) – Ether May 18 '10 at 21:02
  • 7
    For the record (and this may be totally inapplicable to your situation) it is generally a better idea to identify 'good values' and ignore the rest rather than trying to weed out known 'bad values'. The question you need to ask is whether it's possible there may be some bad values you don't know about yet. – Grant McLean May 18 '10 at 21:32
  • 1
    Use black magic to find out whether an item is in an array without inspecting the array! **As the question is, there is no answer possible!** "I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array." – U. Windl May 25 '23 at 11:53
  • Probably you want `%badparams`, not `@badparams`. – U. Windl May 25 '23 at 11:53

12 Answers12

256

Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you're talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you're talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

Optimization Cases for longer arrays:

If your array is sorted, use a "binary search".

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we'll find it in the hash and skip the search).

Note: these optimizations will only be faster when dealing with long arrays. Don't over optimize.

Aaron T Harris
  • 3,038
  • 1
  • 17
  • 13
  • I find this more readable than the hash method. The only time the hash method would seem to make sense would be when you need to check for multiple values. – BHS Apr 09 '13 at 19:42
  • 12
    Double tilde was introduced in Perl 5.10 – Dennis Williamson Apr 27 '13 at 20:41
  • 15
    @DennisWilliamson ...and [in 5.18 it's considered experimantal](https://metacpan.org/pod/release/RJBS/perl-5.18.0/pod/perldelta.pod#The-smartmatch-family-of-features-are-now-experimental). – Grzegorz Rożniecki Oct 31 '13 at 17:56
  • 7
    Avoid smartmatch in production code. It's unstable/experimental pending further notice. – Vector Gorgoth Nov 26 '13 at 16:08
  • 1
    I find it also more readable but [Do not use](http://www.perlmonks.org/?node=How%20can%20I%20tell%20whether%20a%20list%20or%20array%20contains%20a%20certain%20element%3F) says that it is not efficient and checks every element even if it is the first. – giordano Dec 19 '13 at 13:37
  • Using perl 5.8.8 and it throws error: syntax error at script.pl line 646, near "$series_id ~" – Stalinko Feb 12 '15 at 09:30
  • 8
    Do not use if ("value" ~~ @array). ~~ is an experimental feature called Smartmatch. The experiment appears to be deemed a failure and will be removed or modified in a future version of Perl. – yahermann Oct 13 '17 at 20:02
  • 2
    Using `/^$value$/` is problematic if you are not sure about the contents of `$value` --- if there are regexp meta characters (e.g. just an open parenthesis), then the call will likely fail. Either write it `/^\Q$value\E$/`, or avoid the regexp completely, using an `eq` (which is unfortunately slightly less readable): `grep { $_ eq $value } ...`. – Slaven Rezic Sep 25 '19 at 10:10
  • In theory, using `List::Util::any` (available in perls since 5.20, otherwise upgrade it from CPAN) should be faster than using `grep` (and is equally readable), as it needs only half of the operations in average. – Slaven Rezic Sep 25 '19 at 10:11
208

Simply turn the array into a hash:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

$params{$newparam} = 1;

And later get a list of (unique) params back:

@badparams = keys %params;
jkramer
  • 15,440
  • 5
  • 47
  • 48
  • 44
    For the record, this code still does iterate through the array. The map{} call simply makes that iteration very easy to type. – Kenny Wyland Mar 03 '12 at 23:09
  • 3
    I'd only do this if your values in @badparams are pseudo-static and you plan to do a lot of checks against the map. I would not recommend this for a single check. – Aaron T Harris Dec 19 '12 at 21:33
  • Won't this go bang for an array with multiple items with the same value? – Rob Wells Feb 22 '13 at 11:16
  • 3
    @RobWells no, it will work fine. The next time it see's the same value, it will just overwrite the entry in the hash, which in this case sets it to `1` again. – andrewrjones Mar 18 '13 at 10:00
144

You can use smartmatch feature in Perl 5.10 as follows:

For literal value lookup doing below will do the trick.

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

if ($val ~~ @array)

For inline array doing below, will work as above.

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}
Bitmap
  • 12,402
  • 16
  • 64
  • 91
  • 4
    This is nice but seems to be new to Perl 5.10. Took some time before I figured out why I'm getting syntax errors. – Igor Skochinsky Jun 11 '14 at 16:12
  • 19
    **Warning:** you might want to avoid this one, since the operator has apparently different behavior in different versions, and has in the meantime been [marked as experimental](https://metacpan.org/pod/release/RJBS/perl-5.17.11/pod/perldelta.pod#Incompatible-Changes). So unless you have full control over your perl version (and who has that) you should probably avoid it. – Maze Jan 20 '15 at 09:17
  • 1
    I like [this explanation](http://blogs.perl.org/users/mike_b/2013/06/a-little-nicer-way-to-use-smartmatch-on-perl-518.html) about why setting `use experimental 'smartmatch'` is recommended. As I have control of my perl version (internal system), I use the `no warnings 'experimental::smartmatch';` statement. – lepe Jan 07 '16 at 02:39
  • PLEASE Avoid using this `if ( "value" ~~ @array ) ` . It is experimental, and often skips/misses out many matches. It produces random match results on the same data set, when run at multiple times. I almost lost my job when I used it on critical client data :-(( – Aquaholic Jun 24 '22 at 16:02
51

This blog post discusses the best answers to this question.

As a short summary, if you can install CPAN modules then the most readable solutions are:

any(@ingredients) eq 'flour';

or

@ingredients->contains('flour');

However, a more common idiom is:

any { $_ eq 'flour' } @ingredients

But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the ~~ "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list.

any() will stop as soon as it finds your value.

Check out the blog post for more details.

Grzegorz Rożniecki
  • 27,415
  • 11
  • 90
  • 112
mascip
  • 1,347
  • 1
  • 14
  • 16
  • 12
    _[any](http://perldoc.perl.org/List/Util.html#any)_ needs `use List::Util qw(any);`. `List::Util` is in [Core modules](http://perldoc.perl.org/index-modules-L.html). – Onlyjob Sep 26 '15 at 02:51
  • For the sake of clarification, contains is not either builtins either provided by CPAN, the blog just show how to implement it. – ninjaconcombre May 06 '21 at 08:26
26

Method 1: grep (may careful while value is expected to be a regex).

Try to avoid using grep, if looking at resources.

if ( grep( /^$value$/, @badparams ) ) {
  print "found";
}

Method 2: Linear Search

for (@badparams) {
    if ($_ eq $value) {
       print "found";
       last;
    }
}

Method 3: Use a hash

my %hash = map {$_ => 1} @badparams;
print "found" if (exists $hash{$value});

Method 4: smartmatch

(added in Perl 5.10, marked is experimental in Perl 5.18).

use experimental 'smartmatch';  # for perl 5.18
print "found" if ($value ~~ @badparams);

Method 5: Use the module List::MoreUtils

use List::MoreUtils qw(any);
@badparams = (1,2,3);
$value = 1;
print "found" if any {$_ == $value} @badparams;
Community
  • 1
  • 1
Kamal Nayan
  • 1,890
  • 21
  • 34
12

@eakssjo's benchmark is broken - measures creating hashes in loop vs creating regexes in loop. Fixed version (plus I've added List::Util::first and List::MoreUtils::any):

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list;  # precompute hash
timethese(
    100_000, {
        'any' => sub {
            die unless ( any { $hit_regex } @list );
        },
        'first' => sub {
            die unless ( first { $hit_regex } @list );
        },
        'grep' => sub {
            die unless ( grep { $hit_regex } @list );
        },
        'hash' => sub {
            die unless ( $params{$hit} );
        },
    });

And result (it's for 100_000 iterations, ten times more than in @eakssjo's answer):

Benchmark: timing 100000 iterations of any, first, grep, hash...
       any:  0 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 149253.73/s (n=100000)
     first:  1 wallclock secs ( 0.63 usr +  0.01 sys =  0.64 CPU) @ 156250.00/s (n=100000)
      grep: 42 wallclock secs (41.95 usr +  0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
      hash:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)
Community
  • 1
  • 1
Grzegorz Rożniecki
  • 27,415
  • 11
  • 90
  • 112
  • 7
    If you want to test for multiple elements, then creating the hash in advance saves you time. But if you just want to know whether it contains a single element, then you don't have the hash already. Therefore, creating the hash should be part of the computing time. Even more so for the regular expression: you need a new regexp for each element you look for. – fishinear Jan 24 '13 at 12:27
  • 1
    @fishinear True, but if you're only interested in one check, not multiple checks, then obviously it's microoptimization to even wonder about which method is quicker because those microseconds don't matter. *If* you'd want to redo this check, hash is the way to go, cause cost of creating hash once is small enough to be ignored then. Above benchmarks measure *only* different ways of testing, not including any setup. Yes, this may be invalid in your use case, but again - if you're doing only single check you should use whatever is most readable to you and your mates. – Grzegorz Rożniecki Dec 21 '15 at 09:51
11

Even though it's convenient to use, it seems like the convert-to-hash solution costs quite a lot of performance, which was an issue for me.

#!/usr/bin/perl
use Benchmark;
my @list;
for (1..10_000) {
    push @list, $_;
}

timethese(10000, {
  'grep'    => sub {
            if ( grep(/^5000$/o, @list) ) {
                # code
            }
        },
  'hash'    => sub {
            my %params = map { $_ => 1 } @list;
            if ( exists($params{5000}) ) {
                # code
            }
        },
});

Output of benchmark test:

Benchmark: timing 10000 iterations of grep, hash...
          grep:  8 wallclock secs ( 7.95 usr +  0.00 sys =  7.95 CPU) @ 1257.86/s (n=10000)
          hash: 50 wallclock secs (49.68 usr +  0.01 sys = 49.69 CPU) @ 201.25/s (n=10000)
aksel
  • 127
  • 2
  • 4
  • 5
    Using [`List::Util::first`](http://perldoc.perl.org/List/Util.html) is faster as it stops iterating when it finds a match. – RobEarl Dec 06 '12 at 11:42
  • 2
    -1 Your benchmark has defects, `grep` is **significantly** slower than creating hash and doing lookup, since former is O(n) and latter O(1). Just do the hash creation only once (outside the loop) and precompute regex to measure methods only ([see my answer](http://stackoverflow.com/a/13939491/708434)). – Grzegorz Rożniecki Dec 18 '12 at 18:44
  • 4
    @Xaerxess: In my case I wanted to do _one_ lookup, so I think it's fair to count both hash/regex creation and the lookup/grep. It the task would be to do many lookups, I guess your solution is better. – aksel Jan 02 '13 at 06:53
  • 3
    If you want to do only one iteration, the difference is indistinguishable between any of methods you choose, so any benchmark is wrong since it's an evil microoptimization in this case. – Grzegorz Rożniecki Jan 02 '13 at 09:23
  • Well of course it takes longer to convert an array to a hash and do a single lookup, than to do a single lookup in that array. – Jonathon Mar 04 '13 at 04:10
  • 2
    The regex is compiled only once, as it has the flag 'o'. – Apoc Apr 29 '14 at 16:08
3

@files is an existing array

my @new_values =  grep(/^2[\d].[\d][A-za-z]?/,@files);

print join("\n", @new_values);

print "\n";

/^2[\d].[\d][A-za-z]?/ = vaues starting from 2 here you can put any regular expression

Rohan
  • 91
  • 1
  • 3
2

You certainly want a hash here. Place the bad parameters as keys in the hash, then decide whether a particular parameter exists in the hash.

our %bad_params = map { $_ => 1 } qw(badparam1 badparam2 badparam3)

if ($bad_params{$new_param}) {
  print "That is a bad parameter\n";
}

If you are really interested in doing it with an array, look at List::Util or List::MoreUtils

David M
  • 4,325
  • 2
  • 28
  • 40
1

If you need to know the amount of every element in array besides existing of that element you may use

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;
%bad_param_lookup = map { $_ => $bad_param_lookup{$_}++} @bad_params;

and then for every $i that is in @bad_params, $bad_param_lookup{$i} contains amount of $i in @bad_params

0

There are two ways you can do this. You can use the throw the values into a hash for a lookup table, as suggested by the other posts. ( I'll add just another idiom. )

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;

But if it's data of mostly word characters and not too many meta, you can dump it into a regex alternation:

use English qw<$LIST_SEPARATOR>;

my $regex_str = do { 
    local $LIST_SEPARATOR = '|';
    "(?:@bad_params)";
 };

 # $front_delim and $back_delim being any characters that come before and after. 
 my $regex = qr/$front_delim$regex_str$back_delim/;

This solution would have to be tuned for the types of "bad values" you're looking for. And again, it might be totally inappropriate for certain types of strings, so caveat emptor.

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • 1
    You can also write `@bad_param_lookup{@bad_params} = ()`, but you'd need to use `exists` to test membership. – Greg Bacon May 19 '10 at 01:40
-1
my @badparams = (1,2,5,7,'a','zzz');

my $badparams = join('|',@badparams);   # '|' or any other character not present in params

foreach my $par (4,5,6,7,'a','z','zzz')
{
    if ($badparams =~ /\b$par\b/)
    {
        print "$par is present\n";
    }
    else
    {
        print "$par is not present\n";
    }
}

You may want to check for numerical leading spaces consistancy

Serge
  • 1
  • 1
  • 2