6

Is there way to rename the column names in dataset using Jackson annotations while creating a Dataset?

My encoder class is as follows:

import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.*;
import scala.Serializable;
import javax.persistence.Table;

      @Builder 
      @Data 
      @AllArgsConstructor
      @EqualsAndHashCode 
      @Table(name = "sample_table")
      public class SampleRecord implements Serializable {
         @JsonProperty("sample_id")
         private Long sampleId;
         @JsonProperty("sample_name")
         private String name;
         @JsonProperty("sample_desc")
         private String description; 
      }

My aim is to rename the columns according to the @JsonProperty, so that I can re-use the same class and json functionality.

Please find related versions of modules: - Spark : 2.4.0 (with scala 2.11) - jackson-module-scala_2.11 : 2.9.6

Let me know if you need more information. Help appreciated.

Naman
  • 27,789
  • 26
  • 218
  • 353
Arjav96
  • 79
  • 3

2 Answers2

1

public class SampleRecord implements Serializable {
         private Long sampleId;

         private String name;

         private String description; 


@JsonProperty("sample_id")
public void setSampleId(Long sampleId) {
        this.sampleId = sampleId;
    }

@JsonProperty("sample_name")
public void setName(String name) {
        this.name = name;
    }


@JsonProperty("sample_desc")
public void setDescription(String description) {
        this.description = description;
    }
}
YouXiang-Wang
  • 1,119
  • 6
  • 15
0

Interesting idea. The way I would do it:

  1. Ingest your data in a dataframe.
  2. Write a utility method that takes the dataframe and class name (here SampleRecord).
  3. Use introspection to read the annotations (you could eventually add some if you need to define specific properties).
  4. Rename the columns with withColumnRenamed() on the dataframe.
  5. Return the modified dataframe.

hih

jgp
  • 2,069
  • 1
  • 21
  • 40
  • This won't solve the use case, we are trying to rename with columns using @JsonProperty so that the withColumnRename can be avoided. – Sudev Ambadi Jan 25 '19 at 13:25
  • I understand you would like to do that before the ingestion starts, but why? Why don't you want to use withColumnRenamed()? – jgp Jan 25 '19 at 13:29
  • so there is no direct way to create dataset and let the Jackson handle the column names for encoder class? – Arjav96 Jan 25 '19 at 16:00
  • I must admit I do not have a huge experience with Dataset, my teams and I usually use dataframes (Dataset) so we can leverage Tungsten. When you ingest in Tungsten, you can specify a schema and you could automatically build your schema from the annotations. – jgp Jan 26 '19 at 13:38
  • So is there a way to use Dataset df = spark.createDataframe(records, SampleRecord.class) and build the schema using Jackson annotations ?? – Arjav96 Jan 26 '19 at 14:07
  • Where is your data originally? in a CSV file? JSON? database? To get you started, you can look at https://freecontent.manning.com/ingesting-data-from-files-with-spark-part-1/ and instead of .option("inferSchema", true), you can specify a schema with your column names. – jgp Jan 26 '19 at 16:59
  • To add to it - to rename nested columns can use https://sparkbyexamples.com/spark/rename-a-column-on-spark-dataframes/#using-structtype – DanMatlin Jan 28 '20 at 04:30