2

Since the ST_GeomFromText is not the part of org.apache.spark.sql.functions so it will not recognise it internally.I need to first define the UDF for this function. means I need to define the definition of that function and then register that function with spark as UDF then only I can use this function.

I got stuck in beginning to define this function, what parameters will take.

EDIT

The code I used is as follows :

 sparkSession.udf().register("ST_GeomFromText", new UDF1<String, String>() {
        @Override
        public String call(String txt ) {
            return (new ST_GeomFromText(txt));
        }
    }, DataTypes.StringType);

I really need your help.

Thank you

HBoulmi
  • 333
  • 5
  • 16

2 Answers2

1

I think you should use a library like GeoSpark for that. I don't see that the function ST_Geomfromtext is there but it works for other formats like WKT https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Constructor/#st_geomfromwkt. There are lots of other options and functions already implemented on geometrical data types, which I believe they will make your life much easier to calculate areas, crossing points, intersections, etc (for example) if you have to do it.

I am not sure what DB are you using (Postgis, SQL Server Spacial, etc.) but the definition of that function ST_Geomfromtext may slightly differ among them but WKT should be same as it's a standard definition of geometry

Hope this helps

Oscar Lopez M.
  • 585
  • 3
  • 11
1

Similar question-

  1. GeoSpark librairy using Spark Java
  2. From ResultSet to Spark dataframe using Java
  3. GeoSpark using Spark / Java
  4. Undefined function: 'ST_GeomFromText' Using Spark / Java

I think, you haven't followed the GeoSparkSQL-Overview/#quick-start thoroughly-

  1. As per the quick start you need to Add GeoSpark-core and GeoSparkSQL into your project POM.xml or build.sbt
<!-- Geo spark lib doc - https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Overview/#quick-start-->
        <dependency>
            <groupId>org.datasyslab</groupId>
            <artifactId>geospark-sql_2.3</artifactId>
            <version>1.3.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/com.vividsolutions/jts -->
        <dependency>
            <groupId>com.vividsolutions</groupId>
            <artifactId>jts</artifactId>
            <version>1.13</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.datasyslab/geospark-viz -->
        <dependency>
            <groupId>org.datasyslab</groupId>
            <artifactId>geospark-viz_2.3</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.datasyslab</groupId>
            <artifactId>geospark</artifactId>
            <version>1.3.1</version>
        </dependency>
  1. Declare your Spark Session
SparkSession sparkSession = SparkSession.builder()
                .config("spark.serializer", KryoSerializer.class.getName())
                .config("spark.kryo.registrator", GeoSparkKryoRegistrator.class.getName())
                .master("local[*]")
                .appName("myGeoSparkSQLdemo")
                .getOrCreate();
  1. Register all the functions from geospark-sql_2.3 to the sparkSession so that it can be used directly spark-sql
// register all functions from geospark-sql_2.3 to sparkSession
GeoSparkSQLRegistrator.registerAll(sparkSession);

Now Here is the working example-

   SparkSession sparkSession = SparkSession.builder()
                .config("spark.serializer", KryoSerializer.class.getName())
                .config("spark.kryo.registrator", GeoSparkKryoRegistrator.class.getName())
                .master("local[*]")
                .appName("myGeoSparkSQLdemo")
                .getOrCreate();

        // register all functions from geospark-sql_2.3 to sparkSession
        GeoSparkSQLRegistrator.registerAll(sparkSession);
        try {
            System.out.println(sparkSession.catalog().getFunction("ST_Geomfromtext"));
            // Function[name='ST_GeomFromText', className='org.apache.spark.sql.geosparksql.expressions.ST_GeomFromText$', isTemporary='true']
        } catch (Exception e) {
            e.printStackTrace();
        }
        // https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Function/
        Dataset<Row> dataframe = sparkSession.sql("select ST_GeomFromText('POINT(-7.07378166 33.826661)')");
        dataframe.show(false);
        dataframe.printSchema();
        /**
         * +---------------------------------------------+
         * |st_geomfromtext(POINT(-7.07378166 33.826661))|
         * +---------------------------------------------+
         * |POINT (-7.07378166 33.826661)                |
         * +---------------------------------------------+
         */

        // using longitude and latitude column from existing dataframe
        Dataset<Row> df = sparkSession.sql("select -7.07378166 as longitude, 33.826661 as latitude");
        df.withColumn("ST_Geomfromtext ",
                expr("ST_GeomFromText(CONCAT('POINT(',longitude,' ',latitude,')'))"))
        .show(false);
        /**
         * +-----------+---------+-----------------------------+
         * |longitude  |latitude |ST_Geomfromtext              |
         * +-----------+---------+-----------------------------+
         * |-7.07378166|33.826661|POINT (-7.07378166 33.826661)|
         * +-----------+---------+-----------------------------+
         */
Som
  • 6,193
  • 1
  • 11
  • 22
  • hank you for your reply , I tried this code but I get this error :java.lang.NoClassDefFoundError: org/datasyslab/geospark/serde/GeoSparkKryoRegistrator – HBoulmi Jul 10 '20 at 17:26
  • It does not accept this line GeoSparkSQLRegistrator.registerAll(sparkSession); Cannot resolve symbol registerAll – HBoulmi Jul 10 '20 at 17:30
  • 1
    Have you added dependnecies with correct versions. The doc has suggested that in quick start – Som Jul 10 '20 at 17:45
  • 1
    See if you have imported this correctly https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/datasyslab/geosparksql/utils/GeoSparkSQLRegistrator.scala – Som Jul 10 '20 at 17:47
  • I did that : import org.datasyslab.geosparksql.utils.GeoSparkSQLRegistrator.*; – HBoulmi Jul 10 '20 at 17:52
  • 1
    Can you tell me what all functions do you see org.datasyslab.geosparksql.utils.GeoSparkSQLRegistrator? – Som Jul 10 '20 at 18:57
  • How can I do that ? – HBoulmi Jul 10 '20 at 19:10
  • When I run I get this error : ERROR SparkContext: Error initializing SparkContext. java.lang.NoSuchMethodException: com.twitter.chill.KryoSerializer.() – HBoulmi Jul 10 '20 at 19:14
  • @ Someshwar Kale I'm sorry for the inconvenience, but have you tested the code and it works well for you? – HBoulmi Jul 10 '20 at 19:49
  • 1
    Yes...i have added output for the reference. check example section in my answer – Som Jul 11 '20 at 02:10
  • 1
    Please follow this https://datasystemslab.github.io/GeoSpark/api/sql/GeoSparkSQL-Overview/#quick-start – Som Jul 11 '20 at 02:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/217642/discussion-between-hboulmi-and-someshwar-kale). – HBoulmi Jul 11 '20 at 09:14