0

I have 2 dataframes below:

Course:

-------------------------—
Name    | Course
-------------------------—
Tom     | English
Tom     | Mathematics
Lily    | English
Lily    | Chinese

Score:

-------------------------—-------—-------—
Name    | English | Chinese | Mathematics
-------------------------—-------—-------—
Tom     | 78      | 0       | 90
Lily    | 85      | 88      | 0

I want to use course value in course table to join the specified column in score table to get the value. then get the result as below, only 2 course columns will be displayed.

Result:

-------------------------—--
Name    | Course1 | Course2
-------------------------—--
Tom     | 78      | 90
Lily    | 85      | 88
bad_coder
  • 11,289
  • 20
  • 44
  • 72
Lordan
  • 23
  • 2
  • I am working on databricks using pyspark – Lordan Jul 13 '20 at 02:40
  • 3
    and you tried what? I mean, where is the Python code? Here is by the way how to use the value of one column in a dataframe to get a column in another dataframe: https://stackoverflow.com/questions/48432894/pyspark-how-to-use-a-row-value-from-one-column-to-access-another-column-which-h - the rest is just to put it into the join, so your done – UninformedUser Jul 13 '20 at 03:24

1 Answers1

0

You can achieve this by unpivoting the dataframe, then joining the both dataframe then pivoting the resultant dataframe

df1 = spark.createDataFrame([('Tom','English'),
('Tom','Mathematics'),
('Lily','English'),
('Lily','Chinese')],['Name','Course'])

df2 = spark.createDataFrame([('Tom',78,0,90),('Lily',85,88,0)],['Name','English','Chinese','Mathematics'])



df3 = df2.select('Name', expr('''stack(3,'English',English,'Chinese',Chinese,'Mathematics',Mathematics) as (Course, score)'''))

df4 = df3.join(df1, (df1.Name==df3.Name) & (df1.Course==df3.Course), 'inner').select(df1.Name,df1.Course,df3.score).\
groupBy('Name').pivot('Course').agg(min('score'))

df4.show()

+----+-------+-------+-----------+
|Name|Chinese|English|Mathematics|
+----+-------+-------+-----------+
| Tom|   null|     78|         90|
|Lily|     88|     85|       null|
+----+-------+-------+-----------+
Shubham Jain
  • 5,327
  • 2
  • 15
  • 38