I want to mask a random number(e.g. mobile number) in an image and also in pdf too in python. I have different type of image and pdf file but I know that if there is any 10 digit number it is that number. I can find it using regex but I got stuck during masking. Plz help me to resolve this issue.
for image file:
from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
text = pytesseract.image_to_string(Image.open(filepath))
text = re.sub(r'(?i)(\d{10})','xxxxxxxxxx', text)
for PDF file:
from pdfminer.layout import LAParams, LTTextBox
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
fp = 'filepath'
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
pages = PDFPage.get_pages(fp)
for page in pages:
print('Processing next page...')
interpreter.process_page(page)
layout = device.get_result()
for lobj in layout:
if isinstance(lobj, LTTextBox):
x, y, text = lobj.bbox[0], lobj.bbox[3], lobj.get_text()
print('%r: %s' % ((x, y), text))