1

I have the following code..

DataFrame addressDF = sqlContext.read().parquet(addressParquetPath);
DataFrame propertyDF = sqlContext.read().parquet(propertyParquetPath);

DataFrame joinedFrame = addressDF.join(propertyDF, propertyDF.col("LOCID").equalTo(addressDF.col("locid")), "left");

joinedFrame.registerTempTable("joinedFrame");
DataFrame joinedFrameSelect = sqlContext.sql("SELECT LOCID,AddressID FROM joinedFrame");

in the Select LocID is listed twice, how do i pick the LocId of Address instead of property.

Can i execute select on the dataframe by column index?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
user3897533
  • 417
  • 1
  • 8
  • 24

1 Answers1

4

I usually rename the column -- you can either try:

...join(propertyDF.withColumnRenamed("LocID", "LocID_R"), ...

Or if you want to change all of the column names for a DataFrame in one go -- such as add an _R for "right" to every name -- you can try this:

df.toDF(df.columns.map(_ + "_R"):_*)

This is useful when you are joining a DataFrame back onto itself.

David Griffin
  • 13,677
  • 5
  • 47
  • 65