0

I have a org.apache.spark.sql.DataFrame and I would like to convert it into a column: org.apache.spark.sql.Column. So basically, this is my dataframe: val filled_column2 = x.select(first(("col1"),ignoreNulls = true).over(window)) that I want to convert, it into an sql spark column. Anyone could help on that ? Thank you,


@Jaime Caffarel: this is exactly what i am trying to do, this will give you more visibility. You may also check the error msg in the 2d screenshot

enter image description here

enter image description here

Ryan M
  • 18,333
  • 31
  • 67
  • 74
Olfa2
  • 69
  • 8

1 Answers1

0

From the documentation of the class org.apache.spark.sql.Column

A column that will be computed based on the data in a DataFrame. A new column is constructed based on the input columns present in a dataframe:

df("columnName") // On a specific DataFrame.
col("columnName") // A generic column no yet associcated with a DataFrame. col("columnName.field") // Extracting a struct field col("a.column.with.dots") // Escape . in column names. $"columnName" // Scala short hand for a named column. expr("a + 1") // A column that is constructed from a parsed SQL Expression. lit("abc") // A column that produces a literal (constant) value.

If filled_column2 is a DataFrame, you could do:

filled_column2("col1")

******** EDITED AFTER CLARIFICATION ************

Ok, it seems to me that what you are trying to do is a JOIN operation. Assuming that the product_id is a unique key per each row, I would do something like this:

val filled_column = df.select(df("product_id"), last(("last_prev_week_nopromo"), ignoreNulls = true) over window)

This way, you are also selecting the product_id that you will use as key. Then, you can do the following

val promo_txn_cnt_seas_df2 = promo_txn_cnt_seas_df1
            .join(filled_column, promo_txn_cnt_seas_df1("product_id") ===  filled_column("driver_id"), "inner")
// orderBy("product_id", "week")... (the rest of the operations)

Is this what you are trying to achieve?

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
  • 'filled_column2: org.apache.spark.sql.DataFrame = [first(col1) OVER (PARTITION BY x1 ORDER BY x2 ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string]' .. this is filled_column2 – Olfa2 Oct 11 '22 at 15:14
  • What's the type of x on `val x = filled_column2("col1")`? – Jaime Caffarel Oct 11 '22 at 15:44