I am a python programmer and as the Python API is too slow for my Spark application and decided to port my code to Spark Scala API, to compare the computation time.
I am trying to filter out the lines that start with numeric characters from a huge file using Scala API in Spark. In my file, some lines have numbers and some have words and I want the lines that only have numbers.
So, in my Python application, I have these lines.
l = sc.textFile("my_file_path")
l_filtered = l.filter(lambda s: s[0].isdigit())
which works exactly as I want.
This is what I have tried so far.
val l = sc.textFile("my_file_path")
val l_filtered = l.filter(x => x.forall(_.isDigit))
This throws out an error saying that char does not have forall() function.
I also tried taking the first character of the lines using s.take(1) and apply isDigit() function on that in the following way.
val l = sc.textFile("my_file_path")
val l_filtered = l.filter(x => x.take(1).isDigit)
and this too...
val l = sc.textFile("my_file_path")
val l_filtered = l.filter(x => x.take(1).Character.isDigit)
This also throws an error.
This is basically a small error and as I am not accustomed to Scala syntax, I am having hard time figuring it out. Any help would be appreciated.
Edit: As answered for this question, I tried writing the function, but I am unable to use that in filter() function in my application. To apply the function for all the lines in the file.