How can i find the length of the below RDD?
var mark = sc.parallelize(List(1,2,3,4,5,6))
scala> mark.map(l => l.length).collect
<console>:27: error: value length is not a member of Int
mark.map(l => l.length).collect
How can i find the length of the below RDD?
var mark = sc.parallelize(List(1,2,3,4,5,6))
scala> mark.map(l => l.length).collect
<console>:27: error: value length is not a member of Int
mark.map(l => l.length).collect
First you should clarify what you want exactly. In your examplek you are running a map function, so it looks like you are trying to get the length of each of the fields of the RDD, not the RDD size.
sc.textFile
loads everything as Strings
, so you can call length method on each of the fields. Paralellize is parallelizing the information as Ints because your list is made of integers.
If you want the size of an RDD you should run count on the RDD, not on on each field
mark.count()
This will return 6
If you want the size of each element you can convert them to String
if needed, but it looks like a weird requirement. It will be something like this:
mark.map(l => l.toString.length).collect