1

Please help me convert the RDD array of IP address below, into a dataframe.

(Full Disclosure: I have little experience working with RDD)

RDD CREATION:

val SCND_RDD = FIRST_RDD.map(kv => kv._2).flatMap(r => r.get("ip")).map(o => o.asInstanceOf[scala.collection.mutable.Map[String, String]]).flatMap(ip => ip.get("address"))

SCND_RDD.take(3)

RESULTS:

SCND_RDD: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[33] at flatMap at <console>:38

res87: Array[String] = Array(5.42.212.99, 51.34.21.60, 63.99.831.7)`

My rdd<->dataframe conversion attempt:

case class X(callId: String)

val userDF = SCND_RDD.map{case Array(s0)=>X(s0)}.toDF()

This is the error I get

defined class X

<console>:40: error: scrutinee is incompatible with pattern type;

 found   : Array[T]
 required: String
       val userDF = NIPR_RDD22.map{case Array(s0)=>X(s0)}.toDF()
touelv
  • 73
  • 1
  • 7
  • Does this answer your question? [How to convert rdd object to dataframe in spark](https://stackoverflow.com/questions/29383578/how-to-convert-rdd-object-to-dataframe-in-spark) – Lamanus Aug 25 '20 at 01:34
  • Unfortunately no – touelv Aug 25 '20 at 01:36

1 Answers1

0

I leave a comment that is a duplicated question that might help you.

But here I also leave my trial.

val rdd = sc.parallelize(Array("test", "test2", "test3"))
rdd.take(3)

//res53: Array[String] = Array(test, test2, test3)

val df = rdd.toDF()
df.show

+-----+
|value|
+-----+
| test|
|test2|
|test3|
+-----+
Lamanus
  • 12,898
  • 4
  • 21
  • 47