I have an RDD[(String, Iterable[WikipediaArticle])] which looks something like this:
(Groovy,CompactBuffer(WikipediaArticle( {has a String title} , {has some text corresponding to that title}), WikipediaArticle( {has a String title} , {has some text corresponding to that title}))
curly brackets above are just to differentiate between title and text while making things cleaner
Groovy : is the String name
WikipediaArticle: class has two attributes title and text
I need an output of type: List[(String, Int)]
where:
String: is the 1st element in the RDD which is unique on each line
In the above case that is "Groovy"
Int: is the count of WikipediaArticles inside the compactbuffer for that String
I have tried to make things as clear as possible, however, if you think there are chances to improve the question or you have any doubts please feel free to ask.