I'm a beginner with Spark, I have Avro records in the dataset and I'm getting the DataSet created from with those records.
DataDataset<Row> ds = spark.read().format("com.databricks.spark.avro)
.option("avroSchema,schema.toString().load(./*.avro);
One of my column values looks like
+--------------------------+
| col1 |
| VCE_B_WSI_20180914_573 |
| WCE_C_RTI_20181223_324 |
---------------------------+
I would want to split this column multiple columns and would like to group by on this new columns, like below
+------------------+
|col1 |col2|col3 |
| VCE| B| WSI|
| WCE| C| RTI|
+------------------+
I would really appreciate any tips on how should I go about doing this? Should I convert the dataset to RDD and apply these transformations but i'm not sure if i can add new columns in RDD.