I have this simple JSON file:
{"adas":{"parkAssist":{"rear":{"alarm":false,"muted":false},"front":{"alarm":false,"muted":false}},"lane":{"keepAssist":{"right":false,"left":false}}}}
But when I'm trying to read it like this:
spark.read.option("inferSchema", "true") \
.option("multiline", "true") \
.json(///myfile.json) \
.first() \
.asDict()
I get:
{"adas":{"parkAssist":{"rear":{"alarm":false,"muted":false},"front":{"alarm":false,"muted":false}},"lane":{"keepAssist":{"alarm":false,"muted":false}}}}
Which is wrong because adas_lane_keepAssist
arguments are not correct.
If in source JSON I change one of the adas_lane_keepAssist
arguments to "true", then the mapping is correct...
I also thought that maybe it's inferSchema
the root of the problem, so I've made a custom_schema:
custom_schema = StructType([
StructField("adas",StructType([
StructField("parkAssist",StructType([
StructField("rear",StructType([
StructField("alarm",BooleanType(),True),
StructField("muted",BooleanType(),True)
])),
StructField("front",StructType([
StructField("alarm",BooleanType(),True),
StructField("muted",BooleanType(),True)
]))
])),
StructField("lane",StructType([
StructField("keepAssist",StructType([
StructField("right",BooleanType(),True),
StructField("left",BooleanType(),True)
]))
]))
]))
])
and read it like this:
spark.read.schema(custom_schema) \
.option("multiline", "true") \
.json(///myfile.json) \
.first() \
.asDict()
And I get the same wrong result:
{"adas":{"parkAssist":{"rear":{"alarm":false,"muted":false},"front":{"alarm":false,"muted":false}},"lane":{"keepAssist":{"alarm":false,"muted":false}}}}
The funny thing is if I change the order in my custom_shema
like this:
custom_schema = StructType([
StructField("adas",StructType([
StructField("lane",StructType([
StructField("keepAssist",StructType([
StructField("right",BooleanType(),True),
StructField("left",BooleanType(),True)
]))
])),
StructField("parkAssist",StructType([
StructField("rear",StructType([
StructField("alarm",BooleanType(),True),
StructField("muted",BooleanType(),True)
])),
StructField("front",StructType([
StructField("alarm",BooleanType(),True),
StructField("muted",BooleanType(),True)
]))
]))
]))
])
Now every argument of adas_parkAssist_front/left
is wrong:
{"adas":{"lane":{"keepAssist":{"right":false,"left":false}}, "parkAssist":{"rear":{"right":false,"left":false},"front":{"right":false,"left":false}}}}
Is this a limitation of PySpark?