I'm using Spark 1.6 with Scala.
i was looking here but i didnt find a clear answer
I have a big file, after filtering the first lines that contain some copyrights I want to take the header (104 fields) and convert it to StructType
schema.
I was thinking to use a class extends Product
trait to define the schema of the Dataframe
and then convert it to Dataframe
according to that schema.
What is the best way to do it.
This is a sample from my file:
text (06.07.03.216) COPYRIGHT © skdjh 2000-2016
text 160614_54554.vf Database 53643_csc Interface 574 zn 65
Start Date 14/06/2016 00:00:00:000
End Date 14/06/2016 00:14:59:999
State "s23"
cin. Nb Start End Event Con. Duration IMSI
32055680 16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
32055680 16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
32055680 16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
32055680 16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
32055680 16/09/2010 16:59:59:245 16/09/2016 17:00:00:000 xxxxxxxxxxxxx
T want to convert it to SparkSQL like this schema
----------------------------------------------------------------------------------------
| cin_Nb | Start | End | Event | Con_Duration | IMSI |
| ----------------------------------------------------------------------------------------|
| 32055680 | 16/09/2010 | 16:59:59:245 | 16/09/2016 | 17:00:00:000 | xxxxx |
| 32055680 | 16/09/2010 | 16:59:59:245 | 16/09/2016 | 17:00:00:000 | xxxxx |
| 32055680 | 16/09/2010 | 16:59:59:245 | 16/09/2016 | 17:00:00:000 | xxxxx |
| 20556800 | 16/09/2010 | 16:59:59:245 | 16/09/2016 | 17:00:00:000 | xxxxx |
| 32055680 | 16/09/2010 | 16:59:59:245 | 16/09/2016 | 17:00:00:000 | xxxxx |
----------------------------------------------------------------------------------------