Questions tagged [johnsnowlabs-spark-nlp]

John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines

External links

Related tags:

100 questions
8
votes
2 answers

unable to download the pipeline provided by spark-nlp library

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library i tried installing different versions of pyspark and spark-nlp library import sparknlp from sparknlp.pretrained import PretrainedPipeline #create…
bhawana
  • 81
  • 1
  • 2
8
votes
1 answer

Do Spark-NLP pretrained pipelines only work on linux systems?

I am trying to set up a simple code where I pass a dataframe and test it with the pretrained explain pipeline provided by johnSnowLabs Spark-NLP library. I am using jupyter notebooks from anaconda and have a spark scala kernet setup using apache…
6
votes
2 answers

spark-nlp : DocumentAssembler initializing failing with 'java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class'

I am trying out the ContenxtAwareSpellChecker provided in https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc The first of the component in the pipeline is a DocumentAssembler from sparknlp.annotator import…
Abhishek P
  • 189
  • 2
  • 9
6
votes
1 answer

How should we use the setDictionary for the lemmatization annotator in Spark-NLP?

I have a requirement where I have to add a dictionary in the lemmatization step. While trying to use it in a pipeline and doing pipeline.fit() I get a arrayIndexOutOfBounds exception. What is the correct way to implement this? are there any…
5
votes
3 answers

After installing sparknlp, cannot import sparknlp

The following ran successfully on a Cloudera CDSW cluster gateway. import pyspark from pyspark.sql import SparkSession spark = (SparkSession .builder .config("spark.jars.packages","JohnSnowLabs:spark-nlp:1.2.3") …
4
votes
1 answer

spark-nlp 'JavaPackage' object is not callable

I am using jupyter lab to run spark-nlp text analysis. At the moment I am just running the sample code: import sparknlp from pyspark.sql import SparkSession from sparknlp.pretrained import PretrainedPipeline #create or get Spark Session #spark =…
4
votes
1 answer

SparkNLP Sentiment Analysis in Java

I want to use SparkNLP for doing sentiment analysis on a spark dataset on column column1 using the default trained model. This is my code: DocumentAssembler docAssembler = (DocumentAssembler) new DocumentAssembler().setInputCol("column1") …
4
votes
1 answer

Spark Python Pyspark How to flatten a column with an array of dictionaries and embedded dictionaries (sparknlp annotator output)

I'm trying to extract the output from the sparknlp (using Pretrained Pipeline 'explain_document_dl'). I have spent a lot of time looking for ways (UDFs, explode, etc) but cannot get anywhere close to a workable solution. Say I want to get extract…
Peggy
  • 83
  • 2
  • 10
4
votes
1 answer

How to load a spark-nlp pre-trained model from disk

From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata. How do I load that into a Scala NerCrfModel so that I can use it? Do I have to drop it…
3
votes
1 answer

TypeError: 'JavaPackage' object is not callable | using java 11 for spark 3.3.0, sparknlp 4.0.1 and sparknlp jar from spark-nlp-m1_2.12

spark nlp jar, I got it from https://jar-download.com/artifacts/com.johnsnowlabs.nlp/spark-nlp-m1_2.12/4.0.1/source-code JAVA_HOME = C:\Program Files\Java\jdk-18.0.1.1 In the system variables and users admin variables. ''' import pyspark from…
3
votes
1 answer

Sentence similarity with SparkNLP only works on Google Dataproc with ONE sentence, FAILS when multiple sentences are provided

Deployed the following colab python code(see link below) to Dataproc on Google Cloud and it only works when the input_list is an array with one item, when the input_list has two items then the PySpark job dies with the following error on line "for r…
3
votes
1 answer

Where can I find a list of class labels for pretrained SparkNLP NerDLModel?

I have been searching for a while but no luck finding out what NER labels are included in the pretrained NerDL(tensorflow) model. I would think the training data can provide such information, but I do not see it mentioned in any…
ZEE
  • 188
  • 1
  • 12
3
votes
1 answer

How to use JohnSnowLabs NLP Spell correction module NorvigSweetingModel?

I was going through the JohnSnowLabs SpellChecker here. I found the Norvig's algorithm implementation there, and the example section has just the following two lines: import…
user3243499
  • 2,953
  • 6
  • 33
  • 75
2
votes
1 answer

Wrong or missing inputCols annotators - spark-nlp

I'm new to NLP and started with the spark-nlp package for Python. I trained a simple NER model, which I saved and now want to use. However, I am facing the problem of wrong or missing inputCols, despite the dataframe looking accurate. What am I…
padraig
  • 31
  • 6
2
votes
1 answer

How to extract embeddings generated from sparknlp WordEmbeddingsModel to feed a RNN model using keras and tensorflow

I have a text classification problem. I'm particularly interested in this embedding model in sparknlp because I have a dataset from Wikipedia in 'sq' language. I need to convert sentences of my dataset into embeddings. I do so by…
1
2 3 4 5 6 7