-3

I have a list containing the pdf files

l=['ab.pdf', 'cd.pdf', 'ef.pdf', 'gh.pdf']

Out of these four files few are duplicate only names are changed, how to delete those file from the list ?

for example ab.pdf and cd.pdf are same, so the final output will be

l=['ab.pdf', 'ef.pdf', 'gh.pdf']

I have tried filecmp library but it only tells if two files are duplicate.

How to do it most efficiently in pythonic way ?

halfer
  • 19,824
  • 17
  • 99
  • 186
Kallol
  • 2,089
  • 3
  • 18
  • 33
  • How would I know that they're duplicates? :| – cs95 Jun 27 '19 at 14:04
  • they should be having same contents – Kallol Jun 27 '19 at 14:05
  • And how do I know that? I don't have the contents? – cs95 Jun 27 '19 at 14:06
  • 1
    Possible duplicate of [In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique \*while preserving order\*?](https://stackoverflow.com/questions/89178/in-python-what-is-the-fastest-algorithm-for-removing-duplicates-from-a-list-so) – Tianmin Lyu Jun 27 '19 at 14:06
  • @KallolSamanta You could check memory size (could be bug prone) but if it is a small scale project that would work. You can also just read all the text using [this](https://www.geeksforgeeks.org/working-with-pdf-files-in-python/) and compare the text. – Error - Syntactical Remorse Jun 27 '19 at 14:08
  • What issue are you having with `filecmp`? Isn't that what you want? – Error - Syntactical Remorse Jun 27 '19 at 14:12
  • @Error-SyntacticalRemorse filecmp just says if two files are duplicate, I can use a loop and get the job done, but I dont want to use the loop, I am looking for some effective to get it done. – Kallol Jun 27 '19 at 14:16
  • @KallolSamanta filecmp is the most effective though, and don't you want to check for duplicates. – Error - Syntactical Remorse Jun 27 '19 at 14:17
  • @Error-SyntacticalRemorse i don't want to use any loop – Kallol Jun 27 '19 at 14:18
  • @KallolSamanta you can't iterate over a list without a loop. Why you don't want to use loops? – alec_djinn Jun 27 '19 at 14:31
  • It's OK to note in a question why it is not a duplicate, but one has to explain how it is different from the proposed duplicate(s). Merely stating that it is different is insufficient. – halfer Mar 25 '20 at 16:13

1 Answers1

0

Then you need to tell the computer the path of the files in your computer and compare the files one by one, although it sounds not such effective.