-1

How to verify and replace all values in a tuple. In example below I want to replace all elements in a tuple to be replaced with 0 whenever the element value is NA. Is their any generic statement rather then verifying elements individually?

eg:

b= RDD[String]

Sample Data

2003,1,29,3,1651,1655,1912,1913,UA,1017,N202UA,141,138,119,-1,-4,ORD,MSY,837,5,17,0,NA,0,NA,NA,NA,NA,NA 2003,1,30,4,1654,1655,1910,1913,UA,1017,N311UA,136,138,108,NA,NA,ORD,MSY,837,2,26,0,NA,0,NA,NA,NA,NA,NA

Desired c = (1017,-1,-4,ORD,MSY) (1017,0,0,ORD,MSY)

val c = b.map( x => x.split(",")).map(x => (x(9),x(14),x(15),x(16),x(17))).map(x => if (_._ == "NA") "0" else _._)
vkrishna
  • 95
  • 2
  • 7
  • 2
    Which tuple item do you want to replace? – Yuval Itzchakov Apr 25 '17 at 06:01
  • there is only one tuple from map, I want all verify all elements in the tuple – vkrishna Apr 25 '17 at 06:26
  • I do not understand the problem. Try to clarify it, please. Use flatMap, instead of the first map. What is `b`? Can you give a sample? Can you give an example of input and desired output? – vefthym Apr 25 '17 at 07:36
  • You seem to be trying to map over a tuple (within the outer Spark `map`). This question may be relevant: https://stackoverflow.com/questions/2339863/use-map-and-stuff-on-scala-tuples – DNA Apr 25 '17 at 07:42

1 Answers1

0

use filter on your RDD instead of map.

Vishal
  • 11
  • 1
  • 5
  • is this an attempt to answer the OP's question, or just a recommendation? Can you elaborate on that? – vefthym Apr 25 '17 at 07:39
  • You can use the DataFrame.na.fill() method in Scala and the DataFrame.fillna() method in Python. Here are the relevant links: Scala: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameNaFunctions Python: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.fillna – Vishal Apr 25 '17 at 09:38