I want to find words in notepad++ that are not used in any files. Suppose i have a dictionary and a book. I want to find words from the dictionary that are not present in books. How can i do this? Thanks.
Asked
Active
Viewed 64 times
0
-
This is not a job for Notepad++. Write a script in your favorite scripting language, it is easier and more efficient. – Toto Nov 25 '21 at 18:12
1 Answers
1
As suggested by Toto, Notepad++ is not the job for this. That being said, it is not impossible in Notepad++. Here is how to do it with Shakespeare's sonnet 24:
Mine eye hath play'd the painter and hath stell'd
Thy beauty's form in table of my heart;
My body is the frame wherein 'tis held,
And perspective it is the painter's art.
For through the painter must you see his skill,
To find where your true image pictured lies;
Which in my bosom's shop is hanging still,
That hath his windows glazed with thine eyes.
Now see what good turns eyes for eyes have done:
Mine eyes have drawn thy shape, and thine for me
Are windows to my breast, where-through the sun
Delights to peep, to gaze therein on thee;
Yet eyes this cunning want to grace their art;
They draw but what they see, know not the heart.
- Format your book so that it consists of one word per line. Start by going to
Search->Replace
and typing\b([A-Za-z']+)\b
into theFind what:
field and\1\n
into theReplace with:
field. Then ensure theRegular expression
radio box is checked and pressReplace All
. This gives us
Mine
eye
hath
play'd
the
...
they
see
, know
not
the
heart
.
- Remove all punctuation from the document by putting
[ .,;:]
into theFind what
and making sure theReplace with
is empty:
Mine
eye
hath
play'd
the
...
grace
their
art
They
draw
but
what
they
see
know
not
the
heart
- Now copy your dictionary (which I hope is in the form of one word per line) above the text. I will just use an example dictionary containing the words
painter, aeroplane, camel, shape, done
. Mark the end of the dictionary with something unique so that you can find it later. You should now have
painter
aeroplane
camel
shape
done
----ENDOFDICTIONARY---
Mine
eye
hath
play'd
...
the
heart
- Make everything lowercase by pressing
Ctrl-A
to select everything and then pressingCtrl-U
- Open the Replace dialog and put
^(.*?)$\s+?^(?=.*^\1$)
(cf this answer) intoFind what
and leaveReplace with
empty. Ensure the. matches newline
checkbox (next to theRegular Expression
radio box) is checked. Now pressReplace All
and all the words in the dictionary list which appear in the book will be removed:
aeroplane
camel
----ENDOFDICTIONARY---
eye
play'd
stell'd
beauty's
...
The words above ---ENDOFDICTIONARY--- will be those which appear nowhere in the text.

Dominic Price
- 1,111
- 8
- 19
-
By the way, you can probably do this without splitting each word onto a newline by modifying the regexes, but I was hoping to be able just use the `Remove duplicate lines` function but this didn't seem to work for me. – Dominic Price Nov 25 '21 at 18:45