Two conditions in "if" part of if/else statement using Pyspark

Question

I need to interrupt the program and throw the exception below if the two conditions are met, otherwise have the program continue. This works fine while only using the 1st condition, but yields an error when using both conditions. The below code should throw the exception if the DF is non-zero and the value for DF.col1 is not 'string.' Any tips to get this working?

if (DF.count() > 0) & (DF.col1 != 'string'): 
  raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")
else: 
  pass

This throws the error:

" Py4JError: An error occurred while calling o678.and. Trace: 
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist "

Some sample data:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType

data2 = [("not_string","test")]

schema = StructType([ \
    StructField("col1",StringType(),True), \
    StructField("col2",StringType(),True) \
  ])
 
DF = spark.createDataFrame(data=data2,schema=schema)
DF.printSchema()
DF.show(truncate=False)

Python uses the `and` keyword for conditions rather than `&` or `&&` as you might expect if you are coming form another language. You expression should read: `if DF.count() > 0 and DF.col1 != "string":` — joshmeranda, Mar 15 '22 at 20:36
Please make clear when an exception should occur as your current question and code are unclear. For example, should it be when the row count is bigger than zero and the *datatype* of `col1` is not string? Or when there exists a row where `col1` is not equal to the value`'string'`? Ideally you add a sample df with expected output. — ScootCork, Mar 15 '22 at 21:17
@scootCork I've edited the OP. The exception should be thrown when the count is non-zero and the value for col1 is 'string' — Dr.Data, Mar 15 '22 at 21:32
Your df could have many rows, do you mean if there is any row in your df where col1 is not equal to 'string' it should raise an exception? — ScootCork, Mar 15 '22 at 21:38
@ScootCork Yes, if there are any observations in the DF and col1 is not equal to the value "string"...that should produce the exception. — Dr.Data, Mar 15 '22 at 21:49

score 1 · Accepted Answer · answered Mar 15 '22 at 21:52

IIUC you want to raise an exception if there are any rows in your dataframe where the value of col1 is unequal to 'string'.

You can do this by using a filter and a count. If there are any rows unequal to the value 'string' the count will be bigger than 0 which evaluates to True raising your Exception.

data2 = [("not_string","test")]

schema = StructType([ \
    StructField("col1",StringType(),True), \
    StructField("col2",StringType(),True) \
  ])
 
DF = spark.createDataFrame(data=data2,schema=schema)

if DF.filter(DF.col1 != 'string').count():
    raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")

Exception: !!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!

score 0 · Answer 2 · answered Mar 15 '22 at 20:42

0

In Python, the & operator is a bitwise operator that acts on bits to perform a bit by bit operation. For "and" logic in conditions you must use and:

if (DF.count() > 0) and (DF.col1 != 'string'): 
  raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")
else: 
  pass

answered Mar 15 '22 at 20:42

PApostol

2,152
2
11
21

I tried this and received another error: "ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions." – Dr.Data Mar 15 '22 at 20:53
I think that's because you're comparing a column object with `DF.col1 != 'string'` rather than the actual data in the DF - see answers [here](https://stackoverflow.com/questions/48282321/valueerror-cannot-convert-column-into-bool) for more details. – PApostol Mar 15 '22 at 20:59
Test your code, preferably before posting it – David דודו Markovitz Mar 15 '22 at 21:21

Two conditions in "if" part of if/else statement using Pyspark

2 Answers2