Apache Stanbol can analyze texts in many different languages. So far the following languages are supported (precision and recall values may vary according to the language):
- English,
- 中文 (Chinese),
- Español (Spanish),
- Русский (Russian),
- Português (Portuguese),
- Deutsch (German),
- Italiano (Italian),
- Nederlands (Dutch),
- Svenska (Swedish),
- Dansk (Danish),
- العربية (Arabic),
- עברית (Hebrew),
- 日本語 (Japanese).
The analysis will return the discovered entities. The analysis output format can be:
- JSON-LD,
- RDF/XML,
- RDF/JSON,
- Turtles,
- N-TRIPLES.
Entities, or tagging, of texts can be further tailored according to the system configuration. Ideally any custom vocabulary can be plugged into the system.
There are a couple of demo end-points:
Not sure whether all the above languages are supported in the afore-mentioned end-points.
RedLink GmbH is going to provide cloud services based on Apache Stanbol and related software.
The WordLift plugin for WordPress already provides text analysis within WordPress for all the afore-mentioned languages (currently in testing stage). You can try it out installing the plug-in in WordPress and submitting textual contents in the post body.
You can also subscribe and write to the Apache Stanbol mailing list for specific requests or information.