-1

I'm trying to write a snippet to get results for OR with NOT EQUAL condition but my code isn't working.

Col1 Col2
12345;34567;67890 Text
34567;98765;65432 Text

My condition is the cell should NOT contain 12345 OR 67890. So ideally I want FALSE for first row and TRUE for second row. Below is the code I wrote

(~F.col("Col1").like("%12345%")) | (~F.col("Col1").like("%67890%"))

But the result from the above code is TRUE for both the rows. Will someone please correct me where I'm getting it wrong?

Rajesh Pandya
  • 1,540
  • 4
  • 18
  • 31

3 Answers3

0

You can use XOR to evaluate that only one of the conditions its true.

condition1 = F.col("Col1").like("%12345%")
condition2 = F.col("Col1").like("%67890%")

result = (condition1 & ~(condition2)) | (~(condition1) & condition2)

Output of: df.withColumn("Result", result).show():

Col1 Col2 Result
12345;34567;67890 Text false
34567;98765;65432 Text false
12345;98765;65432 Text true

pyspark when:

condition1 = F.col("Col1").like("%12345%")
condition2 = F.col("Col1").like("%67890%")

result = (
    F.when(condition1 & ~condition2, "Condition 1 True")
    .when(~condition1 & condition2, "Condition 2 True")
    .otherwise("Neither condition is True")
)

Output of: df.withColumn("Result", result).show():

Col1 Col2 Result
12345;34567;67890 Text Neither condition...
34567;98765;65432 Text Neither condition...
12345;98765;65432 Text Condition 1 True

XOR works this way:

True ^ True == False
True ^ False == True
False ^ True == True
False ^ False == False

You can negate the value to obtain true if both of the conditions are the same value:

not (condition ^ condition)

Example:

not (True ^ True) == True
not (True ^ False) == False
not (False ^ True) == False
not (False ^ False) == True
  • Thank you for sharing the logic. But I'm getting below error: "^ is not a supported operation for types Column and Column. Please review your code." – SaurabhShelar Apr 21 '23 at 04:18
  • @SaurabhShelar Modified to use pyspark "when" – Faku Venturi Apr 21 '23 at 04:51
  • @SaurabhShelar improved – Faku Venturi Apr 21 '23 at 05:24
  • @Fake Venturi My expected output is a Boolean value. I feel your workaround is little extra as what if we have multiple conditions to be satisfied instead of just 2 as in this case. I'm looking for a solution which gives me a Boolean value FALSE even if one of the condition is FALSE. – SaurabhShelar Apr 21 '23 at 05:46
  • `(~F.col("Col1").like("%12345%")) & (~F.col("Col1").like("%67890%"))` True if it contains neither, False if it contains one or both. @SaurabhShelar – Faku Venturi Apr 21 '23 at 06:07
0

My condition is the cell should NOT contain 12345 OR 67890.

Despite the way we express this in everyday English, you should just use & instead of |.

The logic is that the string does not contain 12345 and it also does not contain 67809.

The Photon
  • 1,242
  • 1
  • 9
  • 12
0

You can use rlike with the pipe/or (|):

out = df.withColumn("Result", ~F.col("Col1").rlike("12345|67890"))

Or following your approach, you need to add the missing parenthesis :

(~(F.col("Col1").like("%12345%")) | ~(F.col("Col1").like("%67890%")))

​ Out :

out.show(truncate=False)
​
+-----------------+----+------+
|Col1             |Col2|Result|
+-----------------+----+------+
|12345;34567;67890|Text|false |
|34567;98765;65432|Text|true  |
+-----------------+----+------+
Timeless
  • 22,580
  • 4
  • 12
  • 30