8

I am trying to extract the complete Aadhar number (12 digits) from the image of an Aadhar card (India)

enter image description here

I am able to identify the region with QR code. To extract the info - I have been looking into python libraries that read and decode Secure QR codes on Indian Aadhaar cards. These 2 libraries seem particularly useful for this use case:

  1. pyaadhaar
  2. aadhaar-py

I am unable to decode Secure QR code using them on Aadhaar cards. Information on Secure QR code is available here. Please recommend possible resolutions or some other methods to achieve this task

Here is my code for decoding secure QR code using these libraries. Python version: 3.8

from pyaadhaar.utils import Qr_img_to_text, isSecureQr
from pyaadhaar.deocde import AadhaarSecureQr
from pyaadhaar.deocde import AadhaarOldQr

qrData = Qr_img_to_text(sys.argv[1])
print(qrData)

if len(qrData) == 0:
    print(" No QR Code Detected !!")
else:
    isSecureQR = (isSecureQr(qrData[0]))
    if isSecureQR:
        print("Secure QR code")
        try:
            obj  = AadhaarSecureQr(qrData[0])
        except:
            print("Try aadhaar-py library")
            from aadhaar.qr import AadhaarSecureQR
            integer_scanned_from_qr = 123456
            # secure_qr = AadhaarSecureQR(integer_scanned_from_qr)
            secure_qr = AadhaarSecureQR(int(qrData[0]))
            decoded_secure_qr_data = secure_qr.extract_data()
            print(decoded_secure_qr_data)

Here are the issues I am facing with these libraries:

  1. pyaadhaar: Secure QR code decoding code, tries to convert base10 string to bytes and fails. NOTE: For Old QR Code format of Aadhaar card, pyaadhaar library works well, this issue only occurs for Secure QR code. Stacktrace below:

    File "/home/piyush/libs/py38/lib/python3.8/site-packages/pyaadhaar/deocde.py", line 23, in __init__
    bytes_array = base10encodedstring.to_bytes(5000, 'big').lstrip(b'\x00')
    

    AttributeError: 'str' object has no attribute 'to_bytes'

  2. aadhaar-py: Secure QR decoding fails cause it is unable to validate integer received from QR code. Stacktrace below:

    Traceback (most recent call last): File "/home/piyush/libs/py38/lib/python3.8/site-packages/aadhaar/qr.py", line 55, in init self.decompressed_byte_array = zlib.decompress(self.byte_array, wbits=16+zlib.MAX_WBITS) zlib.error: Error -3 while decompressing data: incorrect header check During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "aadhaarQRCode.py", line 52, in secure_qr = AadhaarSecureQR(integer_scanned_from_qr) File "/home/piyush/libs/py38/lib/python3.8/site-packages/aadhaar/qr.py", line 57, in init raise MalformedIntegerReceived('Decompression failed, please send a valid integer received from QR code') aadhaar.exceptions.MalformedIntegerReceived: Decompression failed, please send a valid integer received from QR code

Piyush Makhija
  • 304
  • 2
  • 11
  • 2
    I don't know about aadhar. Can you explain the process? Do you need to read a standard QR code and interprete the string in a special way, or do you have to read the QR code graphic in a special way, differently from other QR codes? – Micka Sep 20 '21 at 11:05
  • Hey @Micka, thanks for asking this. Let me give you some context on Indian Aadhaar cards. They are basically a Universal ID for Indian Citizens and have a unique 12-digit number attached to them as a UID. Recent versions of this card have introduced machine readable secure QR codes (additng link to detailed info on Secure QR code in the question above). – Piyush Makhija Sep 21 '21 at 05:32
  • @Micka Here is the Process: 1. Read the Sercure QR code on aadhaar card yields a 10-digit numeric code. (Assuming you already have the Region-Of-Interest and have a method to read the QR code) 2. You need to decode this 10-digit code as per UIDAI guidelines to find the details for given card. This is where I'm stuck. The above mentioned 2 libraries try to do this, but I'm running into issues with them... 3. This information can be then utilized for document verification – Piyush Makhija Sep 21 '21 at 05:32
  • So your question is about how to decode adhaar and not about how to find and read the QR code? Or are you in doubt whether you read the qr correctly? If the code is as you use it: You are hard coding dummy data to the input: integer_scanned_from_qr = 123456 . – Micka Sep 21 '21 at 06:03
  • yes, that is correct. Decoding aadhaar data is the problem I'm facing here. The dummy input I was trying for sanity check. My bad. Let me update it – Piyush Makhija Sep 22 '21 at 03:14

4 Answers4

6

I think I have identified two issues:

  • The quality of the posted sample image is not good enough.
  • The posted sample is just an example, and not a real "Secure QR code", but just an example (isSecureQR returns false).

Resizing the input by a factor of 2 allows reading the QR code:

Reading, resizing and saving as a new image:

import cv2    

image_file_name = 'image.png';

img = cv2.imread(image_file_name, cv2.IMREAD_GRAYSCALE)  # Read image as grayscale.
img2 = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2), interpolation=cv2.INTER_LANCZOS4)  # Resize by x2 using LANCZOS4 interpolation method.

cv2.imwrite('image2.png', img2)

Complete code sample:

import cv2
from pyaadhaar.utils import Qr_img_to_text, isSecureQr
from pyaadhaar.deocde import AadhaarSecureQr
from pyaadhaar.deocde import AadhaarOldQr

image_file_name = 'image.png';

img = cv2.imread(image_file_name, cv2.IMREAD_GRAYSCALE)  # Read image as grayscale.
img2 = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2), interpolation=cv2.INTER_LANCZOS4)  # Resize by x2 using LANCZOS4 interpolation method.

cv2.imwrite('image2.png', img2)

#qrData = Qr_img_to_text(image_file_name)
qrData = Qr_img_to_text('image2.png')

print(qrData[0])

if len(qrData) == 0:
    print(" No QR Code Detected !!")
else:
    isSecureQR = (isSecureQr(qrData[0]))

Output:

BEGIN:VCARD
VERSION:2.1
N:John Doe
TEL;HOME;VOICE:555-555-5555
TEL;WORK;VOICE:666-666-6666
EMAIL:email@example.com
ORG:TEC-IT
URL:http://www.example.com
END:VCARD

As you can see, the information is readable.


I don't know the reason for the error messages.
I am using Python 3.6 and Windows 10, and there are no errors.


Update:

I think I found a good QR sample here:

enter image description here

You may use the following stages for reading and decoding the QR code:

  • Read the image and convert to Grayscale:

     img = cv2.imread('QR-code.png')
     gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
  • Decode the QR image using pyzbar:

     from pyzbar.pyzbar import decode
    
     code = decode(gray)
     qrData = code[0].data
    

The output is:

qrData = b'2374971804270526477833002468783965837992554564899874087591661303561346432389832047870524302186901344489362368642972767716416349990805756094923115719687656090691368051627957878187788907419297818953295185555346288172578594637886352753543271000481717080003254556962148594350559820352806251787713278744047402230989238559317351232114240089849934148895256488140236015024800731753594740948640957680138566468247224859669467819596919398964809164399637893729212452791889199675715949918925838319591794702333094022248132120531152523331442741730158840977243402215102904932650832502847295644794421419704633765033761284508863534321317394686768650111457751139630853448637215423705157211510636160227953566227527799608082928846103264491539001327407775670834868948113753614112563650255058316849200536533335903554984254814901522086937767458409075617572843449110393213525925388131214952874629655799772119820372255291052673056372346072235458198199995637720424196884145247220163810790179386390283738429482893152518286247124911446073389185062482901364671389605727763080854673156754021728522287806275420847159574631844674460263574901590412679291518508010087116598357407343835408554094619585212373168435612645646129147973594416508676872819776522537778717985070402222824965034768103900739105784663244748432502180989441389718131079445941981681118258324511923246198334046020123727749408128519721102477302359413240175102907322619462289965085963377744024233678337951462006962521823224880199210318367946130004264196899778609815012001799773327514133268825910089483612283510244566484854597156100473055413090101948456959122378865704840756793122956663218517626099291311352417342899623681483097817511136427210593032393600010728324905512596767095096153856032112835755780472808814199620390836980020899858288860556611564167406292139646289142056168261133256777093245980048335918156712295254776487472431445495668303900536289283098315798552328294391152828182614909451410115516297083658174657554955228963550255866282688308751041517464999930825273776417639569977754844191402927594739069037851707477839207593911886893016618794870530622356073909077832279869798641545167528509966656120623184120128052588408742941658045827255866966100249857968956536613250770326334844204927432961924987891433020671754710428050564671868464658436926086493709176888821257183419013229795869757265111599482263223604228286513011751601176504567030118257385997460972803240338899836840030438830725520798480181575861397469056536579877274090338750406459700907704031830137890544492015701251066934352867527112361743047684237105216779177819594030160887368311805926405114938744235859610328064947158936962470654636736991567663705830950312548447653861922078087824048793236971354828540758657075837209006713701763902429652486225300535997260665898927924843608750347193892239342462507130025307878412116604096773706728162016134101751551184021079984480254041743057914746472840768175369369852937574401874295943063507273467384747124843744395375119899278823903202010381949145094804675442110869084589592876721655764753871572233276245590041302887094585204427900634246823674277680009401177473636685542700515621164233992970974893989913447733956146698563285998205950467321954304'

isSecureQR = (isSecureQr(qrData)) returns True.

  • Decode qrData using pyaadhaar:

     secure_qr = AadhaarSecureQr(int(qrData))
     decoded_secure_qr_data = secure_qr.decodeddata()
    

Complete code sample:

import cv2
from pyzbar.pyzbar import decode
from pyaadhaar.utils import isSecureQr
from pyaadhaar.deocde import AadhaarSecureQr

img = cv2.imread('QR-code.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

code = decode(gray)
qrData = code[0].data

isSecureQR = (isSecureQr(qrData))

if isSecureQR:
    secure_qr = AadhaarSecureQr(int(qrData))
    decoded_secure_qr_data = secure_qr.decodeddata()
    print(decoded_secure_qr_data)

Output:

{'email_mobile_status': '3', 'referenceid': '269720190308114407437', 'name': 'Sumit Kumar', 'dob': '01-01-1984', 'gender': 'M', 'careof': 'C/O Ishwar Chand', 'district': 'East Delhi', 'landmark': '', 'house': 'B-31, 3rd Floor', 'location': '', 'pincode': '110051', 'postoffice': 'Krishna Nagar', 'state': 'Delhi', 'street': 'Radhey Shyam Park Extension', 'subdistrict': 'Gandhi Nagar', 'vtc': 'Krishna Nagar', 'adhaar_last_4_digit': '2697', 'adhaar_last_digit': '7', 'email': 'yes', 'mobile': 'yes'}


Your original code is also working with the above image:

from pyaadhaar.utils import Qr_img_to_text, isSecureQr

qrData = Qr_img_to_text('QR-code.png')

isSecureQR = (isSecureQr(qrData[0]))

if isSecureQR:
    secure_qr = AadhaarSecureQr(int(qrData[0]))
    decoded_secure_qr_data = secure_qr.decodeddata()
    print(decoded_secure_qr_data)
Rotem
  • 30,366
  • 4
  • 32
  • 65
  • Hey @Rotem This is Aadhar Card contains sensitive personal information, hence I'm unable to share a sample from our database. I had to make do with a sample image freely available on the internet. I'm trying to find a usable image on the internet. Will update the question if I find one – Piyush Makhija Sep 22 '21 at 03:17
  • I think I found a good example [here](https://uidai.gov.in/ecosystem/authentication-devices-documents/qr-code-reader.html). I have updated my post. I am not sure if the sample is "old" or "new". – Rotem Sep 22 '21 at 09:50
  • 1
    Awesome solution! So a regular Python QR library can read it, it's just that you needed to use the special `pyaadhaar` library to check if it was secure and then decode it – nathancy Sep 22 '21 at 10:02
  • @nathancy I read the [documentation](https://uidai.gov.in/images/resource/User_manulal_QR_Code_15032019.pdf)... The QR library returns a long string of decimal digits that is decoded with special library. I don't know if it's a new format or an old format. I don't know if my solution solves the posted question (I can't see a 12 digits number). I am waiting for a response from the OP. – Rotem Sep 22 '21 at 10:29
  • @Rotem Thanks for sharing this. I am also able to decode secure QR code output with the image you provided. But that is a sample image from UIDAI website. I'm guessing, UIDAI don't want us to receive the full 12 digit Aadhaar card ID number upon decoding this, so they have provided last 4 digits only in decoded output I'm still unable to decode secureQR on documents in our production system – Piyush Makhija Sep 24 '21 at 07:55
  • However, the same code/libraries still don't work on the documents in our production DB. I think, I will now have to dig into these libraries to get to the bottom of this... – Piyush Makhija Sep 24 '21 at 07:56
  • One think that I noticed is that `base10encodedstring` is of type `int`, and not `str`. You shouldn't get an error: `AttributeError: 'str' object has no attribute 'to_bytes'`. Note that the value `5000` in `to_bytes(5000, 'big')` may be too small (may be more than 5000 digits?). Good luck! – Rotem Sep 24 '21 at 08:46
1

For anyone who needs to extract a clean QR code ROI before actually decoding it, here's a simple approach to extract the QR code using thresholding, morphological operations, and contour filtering.

  1. Obtain binary image. Load image, grayscale, Gaussian blur, Otsu's threshold

  2. Connect individual QR contours. Create a rectangular structuring kernel with cv2.getStructuringElement() then perform morphological operations with cv2.MORPH_CLOSE.

  3. Filter for QR code. Find contours and filter using contour approximation, contour area, and aspect ratio.


Here's the image processing pipeline

Load image, grayscale, Gaussian blur, then Otsu's threshold to get a binary image

enter image description here

Now we create a rectangular kernel and morph close to combine the QR code into one contour

enter image description here

We find contours and filter for the QR code using contour area, contour approximation, and aspect ratio. The detected QR code is highlighted in green

enter image description here

Extracted ROI

enter image description here

Code

import cv2
import numpy as np

# Load imgae, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)

# Find contours and filter for QR code using contour area, approximation, and aspect ratio
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.04 * peri, True)
    x,y,w,h = cv2.boundingRect(approx)
    area = cv2.contourArea(c)
    ar = w / float(h)
    if len(approx) == 4 and area > 1000 and (ar > .85 and ar < 1.3):
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
        ROI = original[y:y+h, x:x+w]
        # cv2.imwrite('ROI.png', ROI)

# Display
cv2.imshow('thresh', thresh)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.imshow('ROI', ROI)

# Save images
# cv2.imwrite('thresh.png', thresh)
# cv2.imwrite('close.png', close)
# cv2.imwrite('image.png', image)
# cv2.imwrite('ROI.png', ROI)
cv2.waitKey()     
nathancy
  • 42,661
  • 14
  • 115
  • 137
  • Thanks for your response @nathancy. But I am already able to extract the ROI from this image & with the above mentioned libraries, you don't need to identify ROI as they take care of this piece. This problem is specific to reading & decoding QR codes on Indian Aadhar Cards. What we really want to do is decode the Secure QR code from a given Indian Aadhar Card image to extract a 12-digit Number for document verification – Piyush Makhija Sep 20 '21 at 08:14
1

Thanks for posting the question. I am the author of aadhaar-py, the code raises an exception because the data passed to the lib cannot be parsed. It has to be of a certain type in order for it to be parsable. Please refer the following link for an example: https://uidai.gov.in/te/ecosystem-te/authentication-devices-documents-te/qr-code-reader-te.html

If you scan the qr code present on the page and pass the data received to the lib, you'll receive the extracted data. P.S.: The Lib has been revamped with a new API. Be sure to check it out :) https://pypi.org/project/aadhaar-py/

0

I am the author of pyaadhaar library . The secure qr actually have binary encoding with data in xml format. Now the qr you provided not in that format. So it's throwing error while decoding. Try with physical copy of aadhar card.

If you are interested in the decoding technique of aadhaar qr, go through this pdf

UIDAI Secure Qr Specification.pdf