I am working with Spark 2.1 in Python. I am able to convert an RDD to a DataFrame using the toDF() method. (spark is the spark session initialized earlier)
rdd = spark.read.text(sys.argv[1]).rdd.map(lambda l: l[0].replace("24:00", "00:00") if "24:00" in l[0] else l[0])
fields = [StructField("datetime", StringType(), True),
StructField("temperature", DecimalType(scale = 3), True),
StructField("humidity", DecimalType(scale = 1), True)]
schema = StructType(fields)
df = rdd.map(lambda k: k.split(",")).map(lambda p: (p[0][5:-3], Decimal(p[5]), Decimal(p[6]))).toDF(schema)
But I cannot find where this is in the API docs. So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?