I am new in scala.
I have a csv file stored in hdfs. I am reading that file in scala using
val salesdata = sc.textFile("hdfs://localhost:9000/home/jayshree/sales.csv")
Here is a small sample of data "sales".
C_ID T_ID ITEM_ID ITEM_Price
5 199 1 500
33 235 1 500
20 249 3 749
35 36 4 757
19 201 4 757
17 94 5 763
39 146 5 763
42 162 5 763
49 41 6 824
3 70 6 824
24 161 6 824
48 216 6 824
I have to perform the following operation on it.
1.Apply some discount on each item, on the column d(itemprice) suppose 30% of discount. The formula will be d=d-(30%(d)).
2.Find customer wise minimum and maximum item value after applying 30% discount to each item.
I tried to multiply 30 with the observation of column ITEM_Price. The problem is that the value of d as taken as string. When I am multiplying with a number in result it is showing the value that many time. like (500*3 = 500500500)
I can convert it into a dataframe and do it. But I just want to know that without converting it into a dataframe can we do these operation for an RDD.