How to check whether a sentence is correct (simple grammar check in Python)?

Question

How to check whether a sentence is valid in Python?

Examples:

I love Stackoverflow - Correct
I Stackoverflow love - Incorrect

Nowadays, this could be practically solved using one of those large-scale language models. I don't have the expertise to devise such a process, however. Maybe there will be a ready-made solution available, either in some PhD thesis or (few years later) as an open-source project. — user7610, Oct 08 '22 at 10:33

user7610 · Answer 1 · 2023-05-18T11:02:22.440

There are various Web Services providing automated proofreading and grammar checking. Some have a Python library to simplify querying.

As far as I can tell, most of those tools (certainly After the Deadline and LanguageTool) are rule based. The checked text is compared with a large set of rules describing common errors. If a rule matches, the software calls it an error. If a rule does not match, the software does nothing (it cannot detect errors it does not have rules for).

After the Deadline

import ATD
ATD.setDefaultKey("your API key")
errors = ATD.checkDocument("Looking too the water. Fixing your writing typoss.")
for error in errors:
 print "%s error for: %s **%s**" % (error.type, error.precontext, error.string)
 print "some suggestions: %s" % (", ".join(error.suggestions),)

Expected output:

grammar error for: Looking **too the**
some suggestions: to the
spelling error for: writing **typoss**
some suggestions: typos

It is possible to run the server application on your own machine, 4 GB RAM is recommended.

LanguageTool

https://pypi.python.org/pypi/language-check

>>> import language_check
>>> tool = language_check.LanguageTool('en-US')
>>> text = 'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
>>> matches = tool.check(text)

>>> matches[0].fromy, matches[0].fromx
(0, 16)
>>> matches[0].ruleId, matches[0].replacements
('EN_A_VS_AN', ['an'])
>>> matches[1].fromy, matches[1].fromx
(0, 50)
>>> matches[1].ruleId, matches[1].replacements
('TOT_HE', ['to the'])

>>> print(matches[1])
Line 1, column 51, Rule ID: TOT_HE[1]
Message: Did you mean 'to the'?
Suggestion: to the
...

>>> language_check.correct(text, matches)
'A sentence with an error in the Hitchhiker’s Guide to the Galaxy'

It is also possible to run the server side privately.

Ginger

Additionally, this is a hacky (screen scraping) library for Ginger, arguably one of the most polished free-to-use grammar checking options out there.

Microsoft Word

It should be possible to script Microsoft Word and use its grammar checking functionality.

More

There is a curated list of grammar checkers on Open Office website. Noted in comments by Patrick.

I haven't tried the others, but FWIW LanguageTool doesn't *quite* provide the requested behavior. For example, both `I love you.` and `I you love.` parse as completely valid. — Ponkadoodle, Jul 17 '16 at 05:11
just to add, most of these apparently use the [Open Office Grammar](https://www.openoffice.org/lingucomponent/grammar.html) checker; their website has a list of similar services and [some open source implementations](http://grac.sourceforge.net/). (fun fact -- several grammar errors in their docs). — patrick, Jul 17 '16 at 15:39
@patrick My understanding is that these projects are mostly independent. The only relationship with Open Office is that they integrate. They can hook to Open Office API and provide grammar suggestions from inside Open Office. The list of checkers is useful, though. Thanks. — user7610, Mar 31 '17 at 14:35

score 28 · Accepted Answer · edited Jul 17 '16 at 05:15

28

Check out NLTK. They have support for grammars that you can use to parse your sentence. You can define a grammar, or use one that is provided, along with a context-free parser. If the sentence parses, then it has valid grammar; if not, then it doesn't. These grammars may not have the widest coverage (eg, it might not know how to handle a word like StackOverflow), but this approach will allow you to say specifically what is valid or invalid in the grammar. Chapter 8 of the NLTK book covers parsing and should explain what you need to know.

An alternative would be to write a python interface to a wide-coverage parser (like the Stanford parser or C&C). These are statistical parsers that will be able to understand sentences even if they haven't seen all the words or all the grammatical constructions before. The downside is that sometimes the parser will still return a parse for a sentence with bad grammar because it will use the statistics to make the best guess possible.

So, it really depends on exactly what your goal is. If you want very precise control over what is considered grammatical, use a context-free parser with NLTK. If you want robustness and wide-coverage, use a statistical parser.

edited Jul 17 '16 at 05:15

Ponkadoodle

5,777
5
38
62

answered Apr 20 '12 at 19:34

dhg

52,383
8
123
144

I checked NLTK documentation - https://nltk.googlecode.com/svn/trunk/doc/howto/parse.html. It shows that we have define Grammar first. But if i don't know sentence structure of input, how can i do that ? – ChamingaD Apr 21 '12 at 16:56
@ChamingaD, Do you mean you don't understand how to define the context-free grammar (CFG)? If this is the case, you should probably just do a search for information on CFGs and read up so you understand how to define your grammar. – dhg Apr 21 '12 at 17:19
@ChamingaD The link that 'dhg' suggested was [Chapter 8.](http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html) You can find your way to the 'grammars' [here ←](http://stackoverflow.com/a/6115756/1217270) – Honest Abe Apr 21 '12 at 18:38
Thanks @dhg and Honest Abe. I went through documentation little and checked sample. In that CFG do we have to define nouns and verbs ? eg - N -> "man" | "dog" | "cat" | "telescope" | "park". – ChamingaD Apr 21 '12 at 18:58
55

This is not usable advice (especially the comments). Writing an explicit CFG for a non-trivial fragment of English is an impossible task, unless you have a large team and lots of time. Almost NOBODY uses hand-written rules for real-world text. Statistical techniques are much more powerful, but they cannot easily say "this is ungrammatical". The OP's problem is a lot harder than this answer suggests. – alexis Sep 14 '15 at 15:54
@alexis +1 There are projects maintaining hand-written parsers for world languages, though. For example, https://www.grammaticalframework.org/ and helpful intro lecture for it at https://www.youtube.com/watch?v=x1LFbDQhbso. – user7610 Nov 27 '21 at 12:16
Right, @user7610, it just takes a large team and lots of time. – alexis Nov 28 '21 at 21:47

score 8 · Answer 3 · answered May 07 '20 at 19:46

8

Some other answers have mentioned LanguageTool, the largest open-source grammar checker. It didn't have a reliable, up-to-date Python port until now.

I recommend language_tool_python, a grammar checker that supports Python 3 and the latest versions of Java and LanguageTool. It's the only up-to-date, free Python grammar checker. (full disclosure, I made this library)

answered May 07 '20 at 19:46

jxmorris12

1,262
4
15
27

Very nice. @jxmorris what machine recomended to use (RAM)? please advise. – Serhiy Sep 04 '20 at 19:40
@Serhiy `language_tool_python` runs fine on my laptop (Macbook Pro 15"). I don't think RAM should be a bottleneck. – jxmorris12 Sep 08 '20 at 15:49

score 6 · Answer 4 · answered Oct 12 '20 at 08:44

I would suggest the language-tool-python. For example:

import language_tool_python
tool = language_tool_python.LanguageTool('en-US')

text = "Your the best but their are allso  good !"
matches = tool.check(text)
len(matches)

and we get:

We can have a look at the 4 issues that it found:

1st Issue:

matches[0]

And we get:

Match({'ruleId': 'YOUR_YOU_RE', 'message': 'Did you mean "You\'re"?', 'replacements': ["You're"], 'context': 'Your the best but their are allso  good !', 'offset': 0, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

2nd Issue:

matches[1]

and we get:

Match({'ruleId': 'THEIR_IS', 'message': 'Did you mean "there"?', 'replacements': ['there'], 'context': 'Your the best but their are allso  good !', 'offset': 18, 'errorLength': 5, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling'})

3rd Issue: matches[2] and we get:

Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['also', 'all so'], 'context': 'Your the best but their are allso  good !', 'offset': 28, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

4th Issue:

matches[3]

and we get:

Match({'ruleId': 'WHITESPACE_RULE', 'message': 'Possible typo: you repeated a whitespace', 'replacements': [' '], 'context': 'Your the best but their are allso  good!', 'offset': 33, 'errorLength': 2, 'category': 'TYPOGRAPHY', 'ruleIssueType': 'whitespace'})

If you are looking for a more detailed example you can have a look at the related post of Predictive Hacks

K.E.S · Answer 5 · 2022-09-04T04:36:10.143

2

Step 1

pip install Caribe

Step 2

import Caribe as cb
sentence="I is playing football"
output=cb.caribe_corrector(sentence)
print(output)

edited Sep 04 '22 at 04:36

answered Aug 28 '22 at 05:34

K.E.S

77
5

score 0 · Answer 6 · answered May 25 '23 at 08:49

Based on my research i am sharing my analysis here.

For more accurate and specialized grammar and spell-checking, you can consider using dedicated libraries and tools such as pyaspeller, pyspellchecker, or language-tool-python. These libraries are specifically designed for grammar and spell-checking tasks and may provide better accuracy compared to a general-purpose language model like GPT-3.

Step 1

pip install pyaspeller
pip install language-tool-python

Step 2

from pyaspeller import YandexSpeller
import language_tool_python

def error_correcting(text):
    tool = language_tool_python.LanguageTool('en-US')
    datasets = tool.correct(text)
    return datasets

def error_correct_pyspeller(sample_text):
    speller = YandexSpeller()
    fixed = speller.spelled(sample_text)
    return fixed

input_text = """
This is a sample paragrap with some incorrect spellings and grammer mistaks.
It's importnt to check larje text chunks for accurcy and improve readibility.
Gingerit is a great library for such tasks, and it can handl larje text as well.

Let's try processing this larje text using Gingerit.
"""

output_data = error_correcting(input_text)
print(output_data)

output_text = error_correct_pyspeller(input_text)
print(output_text)

How to check whether a sentence is correct (simple grammar check in Python)?

6 Answers6

After the Deadline

LanguageTool

Ginger

Microsoft Word

More

Linked

Related