I want to create a pyspark dataframe from a python dictionary but the following code
from pyspark.sql import SparkSession, Row
df_stable = spark.createDataFrame(dict_stable_feature)
df_stable.show()
show this error
TypeError: Can not infer schema for type: <class 'str'>
Reading this post on stackoverflow:
Pyspark: Unable to turn RDD into DataFrame due to data type str instead of StringType
I can deduce that maybe the problem is that I used by mistake python standard str instead of StringType and spark doesn't like it. What can I do to make it work??
EDIT:
I created my dictionary using this code
Create multiple lists and store them into a dictionary Python
as you can see, the key is created doing
cc = str(col)
vv = "_" + str(value)
cv = cc + vv
dict_stable_feature[cv] = t
while t
is just a binary list of 1
and 0
.