0

I have to exclude 1M records(let say record is phone number) for a code flow.

Currently I am thinking to store all these records in an array and check in this array whether current record is startWith(regex) any of the stored value in array.

Time complexity for this method is O(n). Can i do this in more efficient way?

David
  • 2,987
  • 1
  • 29
  • 35
Anti code
  • 1
  • 1

1 Answers1

0

Complexity

Your task sounds like it is at least O(N) where N is the number of records you have to process. If you are filtering out N of the M total records (e.g. 1M records out of 1B total, then it will be O(M) unless the M records are already indexed or stored in an appropriate data structure (e.g. a trie).

Wall clock time

The good news is that processing 1M records using string operations should take relatively little wall clock time (seconds at worst).

I recommend you extract 10K records and time your analysis on those in order to get a rough estimate of the total processing cost.

Caveats

A regex is a good filtering mechanism if the "language" of valid (or invalid) records is well defined.

If you are applying a regex on unsanitized client data, beware of Regular Expression Denial of Service. See my post here for details.

James Davis
  • 848
  • 8
  • 12