1

I have created a table in Databricks

create table TabA (latitude float, longitude float, col1 string,col2 string)

utils.executequery( """ update TabA set col1 = ST_Envelope(col2)""" ) I tried converting this output as string but getting error as _tostring() not supported
utils.executequery(""" optimize TabA """)

utils.executequery( """ update TabA set latitude = col1.Lat""" )
utils.executequery(""" optimize TabA """)

utils.executequery( """ update TabA set longitude= col1.Long""" )
utils.executequery(""" optimize TabA """)

I am getting the error

col1#22613: need struct type but got string

I tried casting the "col1" as string, but I was not able to solve this exception. How do I solve it?

ZygD
  • 22,092
  • 39
  • 79
  • 102
Vidhya
  • 13
  • 1
  • 3
  • When creating the table, could you specify `col1 struct` instead of `col1 string`? – ZygD Oct 17 '22 at 09:57
  • Struct datatype is not supported in databricks Error in SQL statement: ParseException: DataType struct is not supported.(line 1, pos 573) – Vidhya Oct 17 '22 at 10:09
  • According to the [documentation](https://sedona.apache.org/api/sql/Function/#st_envelope), the function `ST_Envelope` takes as argument *geometry* data type. But I don't understand what data type is returned. You should check what data type is returned. Is it struct? – ZygD Oct 17 '22 at 11:06
  • The output I received is of string data type – Vidhya Oct 17 '22 at 12:51
  • According to my test, `ST_Envelope(col2)` returns a _geometry_ data type. My col2 was a polygon. Do you know for sure which line generates your error? – ZygD Oct 17 '22 at 15:31

1 Answers1

0

Your error must be coming from col1.Lat and col1.Long. Since your col1 is string, you cannot use dot . notation such as col1.Lat, because this notation is for struct data type, not for string.

Consider this example:

df = spark.createDataFrame([('x', (1.0, 2.0))], 'string_col:string, struct_col:struct<lat:double,lon:double>')
df.createOrReplaceTempView('TabA')

df.printSchema()
# root
#  |-- string_col: string (nullable = true)
#  |-- struct_col: struct (nullable = true)
#  |    |-- lat: double (nullable = true)
#  |    |-- lon: double (nullable = true)

df.show()
# +----------+----------+
# |string_col|struct_col|
# +----------+----------+
# |         x|{1.0, 2.0}|
# +----------+----------+

The following SQL query works, because I address the struct type column and successfully extract the field lon:

spark.sql('select struct_col.lon from TabA').show()
# +---+
# |lon|
# +---+
# |2.0|
# +---+

But the following SQL query fails, because I try to do the same on the string tyoe column.

spark.sql('select string_col.lon from TabA').show()

AnalysisException: Can't extract value from string_col#617: need struct type but got string; line 1 pos 7

ZygD
  • 22,092
  • 39
  • 79
  • 102
  • I also know this is the error, but in databricks while creating table am not able to define the datatype as struct that is the issue and I don't have the data to create data frame also. I need to make the query syntactically correct – Vidhya Oct 18 '22 at 05:17
  • It seems, you don't have struct columns. In other words, "col1" is a column of some other type, but you act with it like it was a struct. Probably, "col1" is of type geometry. You should read the documentation to learn how to extract values from geometry type column. Methods which work with struct type columns (the dot notation) may not necessarily work with geometry type columns. – ZygD Oct 18 '22 at 06:16
  • Don't take the error message "need struct type" literally, you don't need it in your current case. Since you don't have structs, forget about them and learn how to work with your data types. – ZygD Oct 18 '22 at 06:19
  • Yeah I understood now thanks a lot for explaining. – Vidhya Oct 18 '22 at 12:35