How to load a spark-nlp pre-trained model from disk

Question

From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata.

How do I load that into a Scala NerCrfModel so that I can use it? Do I have to drop it into HDFS or the host where I launch my Spark Shell? How do I reference it?

score 5 · Answer 1 · answered Dec 03 '18 at 00:26

5

you just need to provide the path where the folders you mentioned are contained,

import com.johnsnowlabs.nlp.annotators.ner.crf.NerCrfModel
val path = "path/to/unziped/file/folder"
val model = NerCrfModel.read.load(path)
// use your model
model.setInputCols(someCol)
model.transform(yourData) // which contains 'someCol',

As long as I remember, you can place the folder in local FS or distributed FS, hope this helps other users as well!.

best, Alberto.

answered Dec 03 '18 at 00:26

AlbertoAndreotti

478
4
13

For users of Java: You need to cast the result of `load(...)` to the required Class. And somehow I think the SparkNLP API here should be consistent with Spark's `PretrainedPipeline.fromDisk()`. – martin_wun Oct 12 '21 at 08:17

How to load a spark-nlp pre-trained model from disk

1 Answers1