-3

I have a question and I need your help. I have a word file which contains many references like:

[1] M.-H. Han, Y. Li, and C.-H. Hwang, "The impact of high-frequency characteristics induced by intrinsic parameter fluctuations in nano-MOSFET device and circuit," Microelectronics Reliability, vol. 50, pp. 657-661, 2010.

[2] E. Maricau and G. Gielen, "Computer-aided analog circuit design for reliability in nanometer CMOS," Emerging and Selected Topics in Circuits and Systems, IEEE Journal on, vol. 1, pp. 50-58, 2011. . . .

It would be possible that some of these references are similar but their numbers be different. do you have any idea to find a way to delete or reveal similar references with #python? thanks

Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
  • 3
    Welcome to stackoverflow.com. Please learn how to create a [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve). – Brown Bear Jul 24 '18 at 06:14
  • 3
    How are you defining "similar"? Are these references similar? If not, can you give examples that are, and explain why they count as similar? – abarnert Jul 24 '18 at 06:17
  • 1
    Please check [How to Ask](https://stackoverflow.com/help/how-to-ask) before asking a question – U13-Forward Jul 24 '18 at 06:21
  • See [How do I find the duplicates in a list and create another list with them?](https://stackoverflow.com/questions/9835762/how-do-i-find-the-duplicates-in-a-list-and-create-another-list-with-them) – Peter Wood Jul 24 '18 at 06:22

1 Answers1

0

You can split the footnote text into a number and the rest of the reference:

>>> footnote = '[1] P. Wood, "Example Thesis," Some collection, pp 45-46, 2018'
>>> number, reference = footnote.split(' ', 1)
>>> reference
'P. Wood, "Example Thesis," Some collection, pp 45-46, 2018'

Python has a Counter class which is useful for building histograms.

You can add the references to a Counter object and then query it for when there is more than one:

>>> from collections import Counter

>>> counter = Counter()
>>> counter[reference] += 1

You can create a loop of all your footnotes:

>>> for footnote in footnotes:
...     number, reference = footnote.split(' ', 1)
...     counter[reference] += 1

Then access the counts that are greater than 1:

>>> duplicates = [item for item, count in counter.most_common()
...               if count > 1]
Peter Wood
  • 23,859
  • 5
  • 60
  • 99