7

I am trying to join two dataframes with the same column names and compute some new values. after that i need to drop all columns of second table. The number of columns is huge. How can I do it in easier way? I tried to .drop("table2.*"),but this dont work.

Mike
  • 227
  • 1
  • 3
  • 7
  • 1
    even `.drop("table2.specificColumnName")` doesn't work; forget `.drop("table2.*")`. – Bikash Gyawali Jul 02 '19 at 16:17
  • 1
    Could someone explain why `drop("foo.column")` doesn't work? – wrschneider Aug 18 '21 at 13:53
  • not sure... it does work in principal... but when you have many columns, its not feasible. You can then do `.drop('table2.x1', 'table2.x2', 'table2.x3')`, but again, if you have a lot of columns this wont work. And you can't always just say `.drop('table2.*')` as you might want to keep some columns. – GenDemo Jun 17 '22 at 04:06

2 Answers2

6

You can use select with aliases:

df1.alias("df1")
  .join(df2.alias("df2"), Seq("someJoinColumn"))
  .select($"df1.*", $"someComputedColumn", ...)

reference with the parent DataFrame:

df1.join(df2, Seq("someJoinColumn")).select(df1("*"), $"someComputedColumn", ...)
zero323
  • 322,348
  • 103
  • 959
  • 935
-1

Instead of dropping, you can select all the necessary columns that you want hold for further operations something like below

val newDataFrame = joinedDataFrame.select($"col1", $"col4", $"col6")
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
  • 1
    Its not a case, if I have like 50 columns + 50 columns in second table. Can i select "table1.*" + names of new columns – Mike Feb 21 '17 at 10:13