I am currently using pywikibot
to obtain the categories of a given wikipedia page (e.g., support-vector machine
) as follows.
import pywikibot as pw
print([i.title() for i in list(pw.Page(pw.Site('en'), 'support-vector machine').categories())])
The results I get is:
[
'Category:All articles with specifically marked weasel-worded phrases',
'Category:All articles with unsourced statements',
'Category:Articles with specifically marked weasel-worded phrases from May 2018',
'Category:Articles with unsourced statements from June 2013',
'Category:Articles with unsourced statements from March 2017',
'Category:Articles with unsourced statements from March 2018',
'Category:CS1 maint: Uses editors parameter',
'Category:Classification algorithms',
'Category:Statistical classification',
'Category:Support vector machines',
'Category:Wikipedia articles needing clarification from November 2017',
'Category:Wikipedia articles with BNF identifiers',
'Category:Wikipedia articles with GND identifiers',
'Category:Wikipedia articles with LCCN identifiers'
]
As you can see the results I am getting include lot of tracking and maintenance categories of wikipedia such as;
- Category:All articles with specifically marked weasel-worded phrases
- Category:All articles with unsourced statements
- Category:CS1 maint: Uses editors parameter
- etc.
However, the categories I am only interested are;
- Category:Classification algorithms
- Category:Statistical classification
- Category:Support vector machines
I am wondering if there is a way to get all tracing or maintenance
wikipedia categories, so that I can remove them from the results to get only the informative categories.
Or, please suggest me if there are any other ways of eliminating them from the results.
I am happy to provide more details if needed.