Rename duplicate pdf name by increasing counter

Question

So I've written something to pull out certain string (beneficiary) from pdf's and rename the file based on the string but the problem is if there are duplicates, is there any way to add a +1 counter behind the name?

My inefficient code as follow, appreciate any help!:

for filename in os.listdir(input_dir):
    if filename.endswith('.pdf'):
        input_path = os.path.join(input_dir, filename)


pdf_array = (glob.glob(input_dir + '*.pdf'))

for current_pdf in pdf_array:
    with pdfplumber.open(current_pdf) as pdf:
        page = pdf.pages[0]
        text = page.extract_text()

        keyword = text.split('\n')[2]

        try:

            if 'attention' in keyword:

                pdf_to_att = text.split('\n')[2]
                start_to_att = 'For the attention of: '
                to_att = pdf_to_att.removeprefix(start_to_att)
                pdf.close()
                result = to_att
                os.rename(current_pdf, result + '.pdf')
                
            else:

                pdf_to_ben = text.split('\n')[1]
                start_to_ben = 'Beneficiary Name : '
                end_to_ben = pdf_to_ben.rsplit(' ', 1)[1]
                to_ben = pdf_to_ben.removeprefix(start_to_ben).removesuffix(end_to_ben).rstrip()
                pdf.close()
                result = to_ben
                os.rename(current_pdf, result + '.pdf')
                
        except Exception:
            pass

messagebox.showinfo("A Title", "Done!")

edit: the desired output should be

AAA.pdf

AAA_2.pdf

BBB.pdf

CCC.pdf

CCC_2.pdf

Check if it helps : https://stackoverflow.com/questions/13852700/create-file-but-if-name-exists-add-number — Hetal Thaker, Mar 15 '21 at 07:17

score 0 · Answer 1 · answered Mar 15 '21 at 07:52

What you want is to build a string, for the filename, that includes a counter, let's call it cnt. Python has the f-string syntax for this exact purpose, it lets you interpolate a variable into a string.

Initialize your counter before the for loop:

cnt = 0

Replace

os.rename(current_pdf, result + '.pdf')

with

os.rename(current_pdf, f'{result}_{cnt}.pdf')
cnt += 1

The f before the opening quote introduces the f-string, and the curly braces {} let you include any python expression, in your case we just substitute the values of the two variables result and cnt. Then we increment the counter, of course.

It works by adding _0, _1.... on every file, but i would like to add the counter only when there is any duplicate — coconutxyz, Mar 15 '21 at 09:37

LiQiang · Answer 2 · 2021-03-16T04:34:42.997

0

os.path.isfile can be your mate meet your needs.

import os


def get_new_name(result):
    file_name = result + '{}.pdf'
    file_number = 0
    if os.path.isfile(file_name.format('')):  # AAA.pdf
        file_number = 2
    while os.path.isfile(file_name.format('_{}'.format(file_number))):
        file_number += 1

    if file_number:
        pdf_name = file_name.format('_{}'.format(file_number))
    else:
        pdf_name = file_name.format('')

    return pdf_name

my screenshot

I update code for your output format, it can be work.

edited Mar 16 '21 at 04:34

answered Mar 15 '21 at 08:00

LiQiang

1
3

i've tried to slot it in after "result = to_att" but it doesn't work – coconutxyz Mar 15 '21 at 09:57
I updated code and tested, you can have a try. – LiQiang Mar 16 '21 at 04:36
Hi i'm thankful for the answer, i chose another one because i'm too noob for function xD – coconutxyz Mar 17 '21 at 08:41
I have read the answer you choosed, it's clearer. – LiQiang Mar 17 '21 at 09:06

score 0 · Accepted Answer · answered Mar 16 '21 at 04:51

I would use a dict to record the occurrence count of each filename.

dict.get() returns the value for key if key is in the dictionary, else default. If default is not given, it defaults to None

pdf_name_count = {}

for current_pdf in pdf_array:
    with pdfplumber.open(current_pdf) as pdf:
        page = pdf.pages[0]
        text = page.extract_text()

        keyword = text.split('\n')[2]

        try:

            if 'attention' in keyword:
                ...
                result = to_att
                
            else:
                ...
                result = to_ben

            filename_count = pdf_name_count.get(result, 0)
            if filename_count >= 1:
                filename = f'{result}_{filename_count+1}.pdf'
            else:
                filename = result + '.pdf'
            os.rename(current_pdf, filename)
            # increase the name occurrence by 1
            pdf_name_count[result] = filename_count + 1

        except Exception:
            pass

Rename duplicate pdf name by increasing counter

3 Answers3