-3

I have been facing some challenges in writing regex to search Aadhaar number in DLP.

Actually the inbuilt pattern is as below :

\b[2-9][0-9]{11}\b
\b[2-9][0-9]{3} [0-9]{4} [0-9]{4}\b

However above pattern works fine but it gives many false cases by reading digits in vertical manner also. Below will be treated as Aadhaar by reading it vertically which I don’t want it to happen.

Eg.

2355(New Line)
2345(New Line)
7868

Also I want it to restrict search for 12 digits only , if digits are 13 or 11 then do not count it.

I tried below please suggest if it is fine to search entire document if it has Aadhaar number

^[2-9][0-9]{3}\s[0-9]{4}\s[0-9]{4}$
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 2
    Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 24 '22 at 20:18
  • If you want to match those numbers also without whitespace, make them optional [like in this demo](https://regex101.com/r/r7vD1x/1). – bobble bubble Sep 25 '22 at 01:05
  • In your demo URL the vertical chunks are also highlighted that means the regex is reading then vertically but we know Aadhaar is written horizontally in usual cases so we don’t want vertical search to happen. – Suraj Hegde Sep 25 '22 at 04:50

2 Answers2

0

Your RegEx looks right to me.

But keep in mind that your solution is for multi-line search (^ and $ match start/end of the line).

You can experiment with it in this regex101 share link.

Also, you can check this geeksforgeeks.org post for more details.


After reading the comment below I revised my answer to this:

\b[2-9][0-9]{3}[^\S\r\n][0-9]{4}[^\S\r\n][0-9]{4}\b

I used Greg Bacon's answer for matching whitespace but not newlines and combined it with yours. Check the updated regex101 share link to test it furthermore.

Good luck.

Ofer Calvo
  • 159
  • 6
  • Thank you for the response. I have visited these sites even im using regex tester too. The only problem I’m facing with inbuilt the one with \b it is reading vertical 4 digits too such as cells in table or excel column. However, one with dollar or carat symbol represent end and start only. Will it work for entire document search and only restricted to 12 digits search. I have used this in my DLP but if we use $, the DLP is not recognising Aadhaar when mentioned in mail body . So I tried \n but result was same. – Suraj Hegde Sep 25 '22 at 04:47
0

Regex - \b(\d{4}\s\d{4}\s\d{4})\b|\b(\d{12})\b|\b(\d{4}-\d{4}-\d{4})\b

The regex pattern matches the below formats, 0000 0000 0000 0000-0000-0000 000000000000

this will work for numbers with 12 digits.

  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 26 '23 at 22:52