I have the bellow data- I want to group with the first element - I am trying with pySpark core ( NOT Spark SQL)
(u'CRIM SEXUAL ASSAULT', u'HZ256372', u'003', u'43'),
(u'THEFT', u'HZ257172', u'011', u'27'),
(u'ASSAULT', u'HY266148', u'019', u'6'),
(u'WEAPONS VIOLATION', u'HY299741', u'010', u'29'),
(u'CRIM SEXUAL ASSAULT', u'HY469211', u'025', u'19'),
(u'NARCOTICS', u'HY313819', u'016', u'11'),
(u'NARCOTICS', u'HY215976', u'003', u'42'),
(u'NARCOTICS', u'HY360910', u'011', u'27'),
(u'NARCOTICS', u'HY381916', u'015', u'25')
I tried with
file.groupByKey().map(lambda x : (x[0], list(x[1]))).collect()
this didnt worked out