Running into "Py4JJavaError" while converting list to Dataframe using

Question

         Summary: Running into "Py4JJavaError" while converting list to Dataframe using

Python, Jupyter notebook Key: SPARK-24612 URL: https://issues.apache.org/jira/browse/SPARK-24612 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 2.3.1 Environment: >python --version

Python 3.6.5 :: Anaconda, Inc.

java -version

java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

jupyter --version

4.4.0

conda -V

conda 4.5.4

spark-2.3.0-bin-hadoop2.7 Reporter: A B

rdd=sc.parallelize([[1,"Alice",50],[2,"Bob",80]])

rdd.collect() [[1,"Alice",50],[2,"Bob",80]]

However, when i run df=rdd.toDF() i run into the following error: Any help resolving this error is greatly appreciated.

full link here http://mail-archives.apache.org/mod_mbox/spark-issues/201806.mbox/%3CJIRA.13167277.1529535154000.212161.1529535180018@Atlassian.JIRA%3E

score 0 · Answer 1 · answered Jul 15 '18 at 10:34

0

That's because you use inconsistent types:

In the first row the last value is an int.
In the second row the last values is a str.

Therefore types are incompatible with the inferred schema.

answered Jul 15 '18 at 10:34

user10083802

1

Thank you for the correction. I made a correction: [[1, 'Alice', 50], [2, 'Bob', 80]]. The error still persists. – Whitewolf Jul 15 '18 at 10:37

Running into "Py4JJavaError" while converting list to Dataframe using

1 Answers1