0

I am trying to run below query:

df3 = df1.join(df2, df1["DID"] == df2["JID"],'inner')\
          .select(df1["DID"],df1["amt"]-df2["amt"]\
          .where(df1["DID"]== "BIG123")).show()

I get error as shown below:

TypeError: 'Column' object is not callable TypeError
Traceback (most recent call last)

What is the issue with the query and how do I fix it?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Nick Ryan
  • 19
  • 5

2 Answers2

0

There is syntax issue in your query. Closing bracket for select is coming after where block. Below is syntactically query .

df3 = df1.join(df2, df1["DID"] == df2["JID"],'inner')\
          .select(df1["DID"],df1["amt"]-df2["amt"])\
          .where(df1["DID"]== "BIG123").show()
Brijesh
  • 41
  • 2
  • 4
0

Try this code :

 from pyspark.sql import functions as F    

 df3 = df1.join(df2, df1["DID"] == df2["JID"] ,how = 'inner')\
      .select("DID",df1["amt"]-df2["amt"])\
      .where(F.col("DID")== "BIG123" ).show()

Best,

Abakar

abakar
  • 301
  • 3
  • 6