1

I want two join two DataSets DS1 and DS2 to get DS3

DS1 :

+---------+--------------------+-----------+------------+
|Compte   |         Lib        |ReportDebit|ReportCredit|
+---------+--------------------+-----------+------------+
|   447105|Autres impôts, ta...|    77171.0|         0.0|
|   753000|Jetons de présenc...|     6839.0|         0.0|
|   511107|Valeurs à l’encai...|        0.0|     77171.0|
+---------+--------------------+-----------+------------+

DS2:

+---------+------------+
|Compte   |SoldeBalance|
+---------+------------+
| 447105  |      992.13|
| 111111  |     35065.0|

I want to get DS3 like this:

+---------+--------------------+-----------+------------+------------+
|Compte   |           CompteLib|ReportDebit|ReportCredit|SoldeBalance|
+---------+--------------------+-----------+------------+------------+
|   447105|Autres impôts, ta...|    77171.0|         0.0|      992.13|
|   753000|Jetons de présenc...|    6839.0 |         0.0|         0.0|
|   511107|Valeurs à l’encai...|        0.0|     77171.0|         0.0|
    111111|                    |        0.0|         0.0|     35065.0|
+---------+--------------------+-----------+------------+------------+

Can somebody guide me with a sample Spark Java expression. Thanks in advance.

zero323
  • 322,348
  • 103
  • 959
  • 935
OOvic
  • 47
  • 1
  • 6
  • 6
    Welcome to Stack Overflow. You've posted [exactly the same question](https://stackoverflow.com/questions/50160319/how-add-a-column-from-dataset-ds1-to-dataset-ds2-spark-java-api) an hour, which has been closed as a duplicate. Please don't abuse the site by deleting and reposting questions. If duplicate doesn't answer your inquiry, [edit] the question, and describe problems you've faced. Also be sure to follow the instruction from [How to ask](https://stackoverflow.com/help/how-to-ask) and provide reproducible example ([mcve], [Repr. Spark Example](https://stackoverflow.com/q/48427185)) – zero323 May 03 '18 at 18:41

1 Answers1

2

You can achieve this by applying a full outer join and then replacing null values with the desired values.

import static org.apache.spark.sql.functions.*;

...

ds1.join(ds2, ds1.col("Compte").equalTo(ds2.col("Compte")), "full_outer")
                .select(ds1.col("Compte").alias("Compte1"),
                        ds2.col("Compte").alias("Compte2"),
                        ds1.col("Lib"),
                        ds1.col("ReportDebit"),
                        ds1.col("ReportCredit"),
                        ds2.col("SoldeBalance"))
                .withColumn("Compte", when(col("Compte1").isNull(), col("Compte2")).otherwise(col("Compte1")))
                .drop("Compte1", "Compte2")
                .na().fill(0.0, new String[] { "ReportDebit", "ReportCredit", "SoldeBalance" })
                .na().fill("", new String[] { "Lib" })
                .show();

Output:

+--------------------+-----------+------------+------------+------+
|                 Lib|ReportDebit|ReportCredit|SoldeBalance|Compte|
+--------------------+-----------+------------+------------+------+
|Valeurs à l’encai...|        0.0|     77171.0|         0.0|511107|
|Autres impôts, ta...|    77171.0|         0.0|      992.13|447105|
|                    |        0.0|         0.0|     35065.0|111111|
|Jetons de présenc...|     6839.0|         0.0|         0.0|753000|
+--------------------+-----------+------------+------------+------+
Mousa
  • 2,926
  • 1
  • 27
  • 35