I am trying to join two dataframes with the same column names and compute some new values. after that i need to drop all columns of second table. The number of columns is huge. How can I do it in easier way? I tried to .drop("table2.*"),but this dont work.
Asked
Active
Viewed 5,421 times
7
-
1even `.drop("table2.specificColumnName")` doesn't work; forget `.drop("table2.*")`. – Bikash Gyawali Jul 02 '19 at 16:17
-
1Could someone explain why `drop("foo.column")` doesn't work? – wrschneider Aug 18 '21 at 13:53
-
not sure... it does work in principal... but when you have many columns, its not feasible. You can then do `.drop('table2.x1', 'table2.x2', 'table2.x3')`, but again, if you have a lot of columns this wont work. And you can't always just say `.drop('table2.*')` as you might want to keep some columns. – GenDemo Jun 17 '22 at 04:06
2 Answers
6
You can use select
with aliases:
df1.alias("df1")
.join(df2.alias("df2"), Seq("someJoinColumn"))
.select($"df1.*", $"someComputedColumn", ...)
reference with the parent DataFrame
:
df1.join(df2, Seq("someJoinColumn")).select(df1("*"), $"someComputedColumn", ...)

zero323
- 322,348
- 103
- 959
- 935
-1
Instead of dropping, you can select all the necessary columns that you want hold for further operations something like below
val newDataFrame = joinedDataFrame.select($"col1", $"col4", $"col6")

Prasad Khode
- 6,602
- 11
- 44
- 59
-
1Its not a case, if I have like 50 columns + 50 columns in second table. Can i select "table1.*" + names of new columns – Mike Feb 21 '17 at 10:13