2

I'm experimenting with Spark and Spark SQL and I need to concatenate a value at the beginning of a string field that I retrieve as output from a select (with a join) like the following:

val result = sim.as('s)   
    .join(
        event.as('e),
        Inner,
        Option("s.codeA".attr === "e.codeA".attr))   
    .select("1"+"s.codeA".attr, "e.name".attr)  

Let's say my tables contain:

sim:

codeA,codeB
0001,abcd
0002,efgh

events:

codeA,name
0001,freddie
0002,mercury

And I would want as output:

10001,freddie
10002,mercury

In SQL or HiveQL I know I have the concat function available, but it seems Spark SQL doesn't support this feature. Can somebody suggest me a workaround for my issue?

Thank you.

Note: I'm using Language Integrated Queries but I could use just a "standard" Spark SQL query, in case of eventual solution.

zero323
  • 322,348
  • 103
  • 959
  • 935
erond
  • 270
  • 5
  • 15
  • Possible duplicate of [Concatenate columns in apache spark dataframe](http://stackoverflow.com/questions/31450846/concatenate-columns-in-apache-spark-dataframe) – zero323 May 29 '16 at 19:51
  • I think you don't need to do much of work like using map and creating function.just use this "val result = sim.as("s").join(events, sim("codeA")===events("codeA"),"inner") .select("s.codeA","name").withColumn("codeA",concat(lit("1"),lit(""),col("codeA"))).show " – Mahesh Gupta Oct 07 '19 at 05:33

2 Answers2

0

The output you add in the end does not seem to be part of your selection, or your SQL logic, if I understand correctly. Why don't you proceed by formatting the output stream as a further step ?

val results = sqlContext.sql("SELECT s.codeA, e.code FROM foobar")
results.map(t => "1" + t(0), t(1)).collect()
Francois G
  • 11,957
  • 54
  • 59
  • Hi @huitseeker, thanks for your answer. Your solution might be actually an option, since I have to format the output as tab separated values, indeed. However, the issue is still valide, since I also need to perform another join with the "concatenated-value", i.e. something like: `Option("s.codeA".attr === 1+"r.codeA".attr)` where r has codeA values like 10001, 10002, ... – erond Oct 13 '14 at 17:27
0

It's relatively easy to implement new Expression types directly in your project. Here's what I'm using:

case class Concat(children: Expression*) extends Expression {
  override type EvaluatedType = String

  override def foldable: Boolean = children.forall(_.foldable)
  def nullable: Boolean = children.exists(_.nullable)
  def dataType: DataType = StringType

  def eval(input: Row = null): EvaluatedType = {
    children.map(_.eval(input)).mkString
  }
}

val result = sim.as('s)
    .join(
        event.as('e),
        Inner,
        Option("s.codeA".attr === "e.codeA".attr))
    .select(Concat("1", "s.codeA".attr), "e.name".attr)
Nathan Howell
  • 4,627
  • 1
  • 22
  • 30