-1
         Summary: Running into "Py4JJavaError" while converting list to Dataframe using

Python, Jupyter notebook Key: SPARK-24612 URL: https://issues.apache.org/jira/browse/SPARK-24612 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 2.3.1 Environment: >python --version

Python 3.6.5 :: Anaconda, Inc.

java -version

java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

jupyter --version

4.4.0

conda -V

conda 4.5.4

spark-2.3.0-bin-hadoop2.7 Reporter: A B

rdd=sc.parallelize([[1,"Alice",50],[2,"Bob",80]])

rdd.collect() [[1,"Alice",50],[2,"Bob",80]]

However, when i run df=rdd.toDF() i run into the following error: Any help resolving this error is greatly appreciated.

full link here http://mail-archives.apache.org/mod_mbox/spark-issues/201806.mbox/%3CJIRA.13167277.1529535154000.212161.1529535180018@Atlassian.JIRA%3E

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Whitewolf
  • 186
  • 2
  • 5
  • 14

1 Answers1

0

That's because you use inconsistent types:

  • In the first row the last value is an int.
  • In the second row the last values is a str.

Therefore types are incompatible with the inferred schema.

  • Thank you for the correction. I made a correction: [[1, 'Alice', 50], [2, 'Bob', 80]]. The error still persists. – Whitewolf Jul 15 '18 at 10:37