How to create a Dataframe from existing Dataframe and make specific fields as Struct type?

Question

I need to create a DataFrame from existing DataFrame in which I need to change the schema as well.

I have a DataFrame like:

+-----------+----------+-------------+
|Id         |Position   |playerName  |
+-----------+-----------+------------+
|10125      |Forward    |Messi       |
|10126      |Forward    |Ronaldo     |
|10127      |Midfield   |Xavi        |
|10128      |Midfield   |Neymar      |

and I am created this using a case class given below:

case class caseClass (
                       Id: Int = "",
                       Position : String = "" ,
                       playerName : String = "" 
                     )

Now I need to make both Playername and position under Struct type.

ie,

I need to create another DataFrame with schema,

root

|-- Id: int (nullable = true)

|-- playerDetails: struct (nullable = true)

| |--playername: string (nullable = true)

| |--Position: string (nullable = true)

I did the following code to create a new dataframe by referring the link https://medium.com/@mrpowers/adding-structtype-columns-to-spark-dataframes-b44125409803

myschema was

  List(
    StructField("Id", IntegerType, true),
    StructField("Position",StringType, true),
    StructField("playerName", StringType,true)
)

I tried the following code

  spark.sparkContext.parallelize(data),
  myschema
)

but I can't make it happen.

I saw similar question Change schema of existing dataframe but I can't understand the solution.

Is there any solution for directly implement StructType inside the case class? so that I think I don't need to make own schema for creating struct type values.

score 3 · Accepted Answer · answered Apr 02 '19 at 09:52

3

Function "struct" can be used:

// data
val playersDF = Seq(
  (10125, "Forward", "Messi"),
  (10126, "Forward", "Ronaldo"),
  (10127, "Midfield", "Xavi"),
  (10128, "Midfield", "Neymar")
).toDF("Id", "Position", "playerName")

// action
val playersStructuredDF = playersDF.select($"Id", struct("playerName", "Position").as("playerDetails"))
// display
playersStructuredDF.printSchema()
playersStructuredDF.show(false)

Output:

root
 |-- Id: integer (nullable = false)
 |-- playerDetails: struct (nullable = false)
 |    |-- playerName: string (nullable = true)
 |    |-- Position: string (nullable = true)

+-----+------------------+
|Id   |playerDetails     |
+-----+------------------+
|10125|[Messi, Forward]  |
|10126|[Ronaldo, Forward]|
|10127|[Xavi, Midfield]  |
|10128|[Neymar, Midfield]|
+-----+------------------+

answered Apr 02 '19 at 09:52

pasha701

6,831
1
15
22

Your solution is good,But in my case _val inputFile = spark.read.textFile("C:\\Users\\adarsh.k\\Downloads\\players.txt") inputFile.map ( lines => { val Id = extractFun(lines , """regex""" ) val position= extractFun(lines, """regex""") val playerName = extractFun(lines , """regex""" ) caseClass (Id, position,playerName) }).toDF()_ I am taking vlaues to case class like this.and I have a function to extract the data using REGEX – ADARSH K Apr 02 '19 at 10:11
When using this solution I got an error that **Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`Id`' given input columns: [value];;** – ADARSH K Apr 02 '19 at 10:12
Looks like something wrong with extraction from original data to case class, guess, this is out of scope of original question. – pasha701 Apr 02 '19 at 10:13
Sorry, It was actually my mistake. I didn't took the **inputFile.map** as a dataframe. When i tried **val newData=inputFile.map ( lines => {** **........}** It worked For me....Thanks a lot. – ADARSH K Apr 02 '19 at 11:34
Now I am getting like [Messi, Forward] can i get it like **[Messi:Forward]** like key-value pair?? – ADARSH K Apr 02 '19 at 11:42
From original dataframe: playersDF.select($"Id", concat_ws(":", $"Position", $"playerName")) – pasha701 Apr 02 '19 at 11:45

How to create a Dataframe from existing Dataframe and make specific fields as Struct type?

1 Answers1