Modify Text on image and pdf in python

Question

I want to mask a random number(e.g. mobile number) in an image and also in pdf too in python. I have different type of image and pdf file but I know that if there is any 10 digit number it is that number. I can find it using regex but I got stuck during masking. Plz help me to resolve this issue.

for image file:

from PIL import Image, ImageEnhance, ImageFilter 
import pytesseract
text = pytesseract.image_to_string(Image.open(filepath))
text = re.sub(r'(?i)(\d{10})','xxxxxxxxxx', text)

for PDF file:

from pdfminer.layout import LAParams, LTTextBox
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator

fp = 'filepath'
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
pages = PDFPage.get_pages(fp)
for page in pages:
    print('Processing next page...')
    interpreter.process_page(page)
    layout = device.get_result()
    for lobj in layout:
        if isinstance(lobj, LTTextBox):
            x, y, text = lobj.bbox[0], lobj.bbox[3], lobj.get_text()
            print('%r: %s' % ((x, y), text))

score 0 · Answer 1 · answered Sep 16 '19 at 09:57

0

You may want to try these.

Since you have the coordinates of the bounding box in every line from tesseract, you could use that to effectively stretch the resulting bounding box across the ten numbers and then place it back on the image as a mask.

answered Sep 16 '19 at 09:57

CypherX

7,019
3
25
37

But I am unable to blur text in PDF file. P.S. I don't want to convert pdf to image format – Abhishak Varshney Sep 16 '19 at 18:24
I may have found a solution. Will post the updates in next 24 hours. – CypherX Sep 18 '19 at 08:31
Thanks I got the solution. I anyone interested in the same project can use python-reportlab for blur or mask. – Abhishak Varshney Sep 18 '19 at 09:35
That’s great! Could you please also post the solution here as an answer to your own question? – CypherX Sep 18 '19 at 12:50

Modify Text on image and pdf in python

1 Answers1