0

I was trying to implement the DBSCAN model in PySpark framework. Github link of complete implementation . I just copied and run the code from 'README.md' segment. The code is

import dbscan
from sklearn.datasets import make_blobs

from pyspark.sql import types as T, SparkSession

from scipy.spatial import distance

spark = SparkSession \
        .builder \
        .appName("DBSCAN") \
        .config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s_2.11") \
        .config('spark.driver.host', '127.0.0.1') \
        .getOrCreate()
X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4, random_state=5)
data = [(i, [float(item) for item in X[i]]) for i in range(X.shape[0])]
schema = T.StructType([T.StructField("id", T.IntegerType(), False),
                               T.StructField("value", T.ArrayType(T.FloatType()), False)])
df = spark.createDataFrame(data, schema=schema)
df_clusters = dbscan.process(spark, df, .2, 10, distance.euclidean, 2, "checkpoint")

which shows me the error message as

-

**> --------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

in () 12 T.StructField("value", T.ArrayType(T.FloatType()), False)]) 13 df = spark.createDataFrame(data, schema=schema) ---> 14 df_clusters = dbscan.process(spark, df, .2, 10, distance.euclidean, 2, "checkpoint")

4 frames

/usr/local/lib/python3.6/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError(

Py4JJavaError: An error occurred while calling o185.createGraph. : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])' at org.graphframes.GraphFrame$.apply(GraphFrame.scala:676) at org.graphframes.GraphFramePythonAPI.createGraph(GraphFramePythonAPI.scala:10) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:834)**

I really don't know anything on Java or scala ; also beginner at pyspark . Kindly help me to solve out this problem , I need it badly!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

0 Answers0