I am currently developping an application with Spark in Python. I have a dataset of hotels as following: Id, Hotel name, Addres, .... , longitude, latitute
I would like to compute, for each hotel, the top 5 hotel located near by.
Is it possible to do so in Spark ? I do not know if I can parallelize my RDD with my dataset, and then compute each line with the entire dataset.
So here is what I tried : test = booking_data.cartesian(booking_data).map(lambda ((x1, y1),(x2,y2)): distanceBetweenTwoPoints)
distanceBetweenTwoPoints is my function which calculates two points and taking four parameters.
The error displayed is : ValueError: too many values to unpack