0

I am new to Spark. I am trying to implement spark program in java. I just want to read multiple files from a folder and combine altogether by pairing its words@filname as key and value(count).

I don't know how to combine all data together.. and I want the output to be like pairs (word@filname,1)

ex: (happy@file1,2) (newyear@file1,1) (newyear@file2,1)

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
user3152493
  • 21
  • 1
  • 5

1 Answers1

0

refer to the java-spark documentation : https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#input_file_name()

and following this response : https://stackoverflow.com/a/36356253/8357778

you will be able to add a column with filename to your dataframe storing your data. Next to these steps, you just have to select and transform your rows as you want.

If you prefer using an RDD, you convert your dataframe and map it.

Franck Cussac
  • 310
  • 1
  • 3
  • 14
  • I am using RDD , and am reading the files with wholeTextFiles() . Could you plz help solve this. sharing any piece of code would be much more helpful. – user3152493 Feb 07 '18 at 04:23
  • Following the second link you have all piece of code that you need. It is not possible to map an RDD with filename. Read files with SparkSQL and convert it to RDD. For your mapping with values and "@", I think you can do it yourself ;) – Franck Cussac Feb 08 '18 at 08:03