You can proceed as follows:
from pyspark.sql import Row
l = [Row(payload=u"[{'key1':'value1'},{'key2':'value2'},{'key3':'value3'}]"),
Row(payload=u"[{'key1':'value1'},{'key2':'value2'},{'key3':'value3'}]")]
# convert the list of Rows to an RDD:
ll = sc.parallelize(l)
df = sqlContext.read.json(ll.map(lambda r: dict(
kv for d in eval(r.payload) for kv in d.items())))
Explanation:
I guess the only ambiguity is in the following intermediate code:
dict(kv for d in eval(r.payload) for kv in d.items())
is used to convert from this format
[{'key1':'value1'},{'key2':'value2'},{'key3':'value3'}]"
to this one:
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
output:
>>>df
DataFrame[key1: string, key2: string, key3: string]
>>> df.show()
+------+------+------+
| key1| key2| key3|
+------+------+------+
|value1|value2|value3|
|value1|value2|value3|
+------+------+------+