0

So I've written something to pull out certain string (beneficiary) from pdf's and rename the file based on the string but the problem is if there are duplicates, is there any way to add a +1 counter behind the name?

My inefficient code as follow, appreciate any help!:

for filename in os.listdir(input_dir):
    if filename.endswith('.pdf'):
        input_path = os.path.join(input_dir, filename)


pdf_array = (glob.glob(input_dir + '*.pdf'))

for current_pdf in pdf_array:
    with pdfplumber.open(current_pdf) as pdf:
        page = pdf.pages[0]
        text = page.extract_text()

        keyword = text.split('\n')[2]

        try:

            if 'attention' in keyword:

                pdf_to_att = text.split('\n')[2]
                start_to_att = 'For the attention of: '
                to_att = pdf_to_att.removeprefix(start_to_att)
                pdf.close()
                result = to_att
                os.rename(current_pdf, result + '.pdf')
                
            else:

                pdf_to_ben = text.split('\n')[1]
                start_to_ben = 'Beneficiary Name : '
                end_to_ben = pdf_to_ben.rsplit(' ', 1)[1]
                to_ben = pdf_to_ben.removeprefix(start_to_ben).removesuffix(end_to_ben).rstrip()
                pdf.close()
                result = to_ben
                os.rename(current_pdf, result + '.pdf')
                
        except Exception:
            pass

messagebox.showinfo("A Title", "Done!")

edit: the desired output should be

AAA.pdf

AAA_2.pdf

BBB.pdf

CCC.pdf

CCC_2.pdf

coconutxyz
  • 15
  • 4

3 Answers3

0

What you want is to build a string, for the filename, that includes a counter, let's call it cnt. Python has the f-string syntax for this exact purpose, it lets you interpolate a variable into a string.

Initialize your counter before the for loop:

cnt = 0

Replace

os.rename(current_pdf, result + '.pdf')

with

os.rename(current_pdf, f'{result}_{cnt}.pdf')
cnt += 1

The f before the opening quote introduces the f-string, and the curly braces {} let you include any python expression, in your case we just substitute the values of the two variables result and cnt. Then we increment the counter, of course.

joao
  • 2,220
  • 2
  • 11
  • 15
  • It works by adding _0, _1.... on every file, but i would like to add the counter only when there is any duplicate – coconutxyz Mar 15 '21 at 09:37
0

os.path.isfile can be your mate meet your needs.

import os


def get_new_name(result):
    file_name = result + '{}.pdf'
    file_number = 0
    if os.path.isfile(file_name.format('')):  # AAA.pdf
        file_number = 2
    while os.path.isfile(file_name.format('_{}'.format(file_number))):
        file_number += 1

    if file_number:
        pdf_name = file_name.format('_{}'.format(file_number))
    else:
        pdf_name = file_name.format('')

    return pdf_name

my screenshot

I update code for your output format, it can be work.

LiQiang
  • 1
  • 3
0

I would use a dict to record the occurrence count of each filename.

dict.get() returns the value for key if key is in the dictionary, else default. If default is not given, it defaults to None

pdf_name_count = {}

for current_pdf in pdf_array:
    with pdfplumber.open(current_pdf) as pdf:
        page = pdf.pages[0]
        text = page.extract_text()

        keyword = text.split('\n')[2]

        try:

            if 'attention' in keyword:
                ...
                result = to_att
                
            else:
                ...
                result = to_ben

            filename_count = pdf_name_count.get(result, 0)
            if filename_count >= 1:
                filename = f'{result}_{filename_count+1}.pdf'
            else:
                filename = result + '.pdf'
            os.rename(current_pdf, filename)
            # increase the name occurrence by 1
            pdf_name_count[result] = filename_count + 1

        except Exception:
            pass
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52