2

i am new in Pyspark and I have some doubts.

I have a df like this:

+---+---+-------+
| a1| a2|formula|
+---+---+-------+
| 18| 12|  a1+a2|
| 11|  1|  a1-a2|
+---+---+-------+

I'm trying to parse the column 'formula' to create a new column with the formula resolved and obtain a df like this

+---+---+-------+----------------+
| a1| a2|formula|resolved_formula|
+---+---+-------+----------------+
| 18| 12|  a1+a2|              30|
| 11|  1|  a1-a2|              10|
+---+---+-------+----------------+

I have tried using

df2 = df.withColumn('resolved_formula', f.expr(df.formula))
df2.show()

but i'm obtaining this type error

TypeError: Column is not iterable

can someone help me?

Thank you very much!!

DaniloP
  • 29
  • 1
  • 1
    Does this answer your question? [Evaluate formulas in Spark DataFrame](https://stackoverflow.com/questions/66707384/evaluate-formulas-in-spark-dataframe) – vladsiv Sep 13 '22 at 12:17
  • I do not know if it is possible but that is a terrible data model to store formula in the a table like this. Before implementing this, you should probably review your data model. – Steven Sep 13 '22 at 12:44

1 Answers1

0

Here's a complicated way of doing what you intend to.

data_sdf = data_sdf. \
    withColumn('new_formula', func.col('formula'))

# this thing can also be done in a single regex
# technically prefix a variable before all columns to be used in a lambda func
for column in data_sdf.columns:
    if column != 'formula':
        data_sdf = data_sdf. \
            withColumn('new_formula', func.regexp_replace('new_formula', column, 'r.'+column))

# use `eval()` to evaluate the operation
data_sdf. \
    rdd. \
    map(lambda r: (r.a1, r.a2, r.formula, eval(r.new_formula))). \
    toDF(['a1', 'a2', 'formula', 'resolved_formula']). \
    show()

# +---+---+-------+----------------+
# | a1| a2|formula|resolved_formula|
# +---+---+-------+----------------+
# | 18| 12|  a1+a2|              30|
# | 11|  1|  a1-a2|              10|
# +---+---+-------+----------------+
samkart
  • 6,007
  • 2
  • 14
  • 29