pyspark java.net.URISyntaxException: Relative path in absolute URI:

Question

I'm encountering an issue while using Spark in Django with solr. I have a running solr server with the version 8.5.2 and spark_solr 4.0.3. I installed it from spark solr 4.0.3 .

If I call the command

jsonDF = spark.read.schema(schema2).json(json_file)

with the json_file

json_file = '''
    [
        {
            "name": "John Doe",
            "age": 30,
            "city": "New York"
        },
        {
            "name": "Jane Smith",
            "age": 25,
            "city": "London"
        },
        {
            "name": "Bob Johnson",
            "age": 35,
            "city": "Paris"
        }
    ]
    '''

I get the error

java.net.URISyntaxException: Relative path in absolute URI: 
    [
        {
            "name":%20%22John%20Doe%22,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22age%22:%2030,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22city%22:%20%22New%20York%22%0A%20%20%20%20%20%20%20%20%7D,%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22name%22:%20%22Jane%20Smith%22,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22age%22:%2025,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22city%22:%20%22London%22%0A%20%20%20%20%20%20%20%20%7D,%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22name%22:%20%22Bob%20Johnson%22,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22age%22:%2035,%0A%20%20%20%20%20%20%20%20%20%20%20%20%22city%22:%20%22Paris%22%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%5D%0A%20%20%20%20

A Call like

csvDF = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(csvFile)

Gets processed without any error and a search can be done over the data.

The only source I could find which discusses the problem is the stack overflow What is causing "java.net.URISyntaxException: Relative path in absolute URI" when submit spark job? (the second answer)

this is a livy bug, and fixed in latest version in Griffin, you may find the answer in

 https://issues.apache.org/jira/browse/GRIFFIN-248?jql=project%20%3D%20GRIFFIN%20AND%20issuetype%20%3D%20Bug%20AND%20text%20~%20%22%2525%22

The change I did was just to replace the "\" before trying to parse the json.
Class: Application.scala
lines: 44-45, in the moment of parsing the arguments.  

However, this comes from the SparkSubmitJob.java in function setLivyArgs(), where there is a workaround for a livy bug.

The livy version used when encountered "java.net.URISyntaxException: Relative path in absolute URI: " was 0.6.0-incubating.

The Answer refers to Error: java.net.URISyntaxException: Relative path in absolute URI

A comment under the issue is:

Hi amykatz007,

I encountered the same issue and i managed to workaround it by removing the "\" from the rule definition of the measure.

EG:
"rule" : "count(source.subtype) AS \`subtype_count\`" 
becomes
"rule" : "count(source.subtype) AS `subtype_count`"
The change I did was just to replace the "\" before trying to parse the json.
Class: Application.scala
lines: 44-45, in the moment of parsing the arguments.

However, this comes from the SparkSubmitJob.java in function setLivyArgs(), where there is a workaround for a livy bug.

The livy version used when encountered "java.net.URISyntaxException: Relative path in absolute URI: " was 0.6.0-incubating.

Unfortunately, I don't really understand the context and whether the answer actually has anything to do with the error. I would be very grateful for explanations and possible solutions.

What is unclear from the docs that say the parameter to json() function must be a file path? Not a JSON string — OneCricketeer, Jun 19 '23 at 23:53
For the minimal example, I declared the json content in the variable and forgot to mention that a .json is referenced in the actual code. As mentioned in your note, the method only takes a file path. This was overlooked when creating the minimal example. In the JSON file, the JSON content was probably specified as a JSON string when the error occurred. When checking the file, no leading or trailing quotes were set, but it seems plausible that a JSON string was specified in the file instead of the JSON. Why I (presumably) put the quotes in the file is beyond me. Thanks for the comment. — Lukas Trenz, Jun 20 '23 at 08:35
The docs also say the json file content shouldn't have any indentation — OneCricketeer, Jun 20 '23 at 12:15

score 0 · Accepted Answer · answered Jun 19 '23 at 06:33

spark.read.json, here you can pass JSON file path or RDD to json method.

As you are passing JSON string, it will be considered as file path. Since that path is wrong you are getting error.

Convert that JSON string to RDD and pass it to json method to fix the issue

rdd = sc.parallelize([json_file])
df = spark.read.json(rdd)
df.show()

+---+--------+-----------+
|age|    city|       name|
+---+--------+-----------+
| 30|New York|   John Doe|
| 25|  London| Jane Smith|
| 35|   Paris|Bob Johnson|
+---+--------+-----------+

Thank you very much this solved my Problem. :) – Lukas Trenz Jun 20 '23 at 06:32 — Lukas Trenz, Jun 20 '23 at 06:32

pyspark java.net.URISyntaxException: Relative path in absolute URI:

1 Answers1