2

How to identify which kind of exception below renaming columns will give and how to handle it in pyspark:

def rename_columnsName(df, columns):   #provide names in dictionary format
if isinstance(columns, dict):     
    for old_name, new_name in columns.items():
        df = df.withColumnRenamed(old_name, new_name)
    return df.show()
else:
    raise ValueError("'columns' should be a dict, like {'old_name':'new_name', 'old_name_one more':'new_name_1'}")

how to test it by generating a exception with a datasets.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
Gamefic
  • 59
  • 8
  • What kind of handling do you want to do? Maybe you can check before calling withColumnRenamed if the column exists? This will allow you to do required handling for negative cases and handle those cases separately. – UtkarshSahu Aug 16 '20 at 11:32

2 Answers2

0

Here's an example of how to test a PySpark function that throws an exception. In this example, we're verifying that an exception is thrown if the sort order is "cats".

def it_throws_an_error_if_the_sort_order_is_invalid(spark):
    source_df = spark.create_df(
        [
            ("jose", "oak", "switch"),
            ("li", "redwood", "xbox"),
            ("luisa", "maple", "ps4"),
        ],
        [
            ("name", StringType(), True),
            ("tree", StringType(), True),
            ("gaming_system", StringType(), True),
        ]
    )
    with pytest.raises(ValueError) as excinfo:
        quinn.sort_columns(source_df, "cats")
    assert excinfo.value.args[0] == "['asc', 'desc'] are the only valid sort orders and you entered a sort order of 'cats'"

Notice that the test is verifying the specific error message that's being provided.

You can provide invalid input to your rename_columnsName function and validate that the error message is what you expect.

Some other tips:

Powers
  • 18,150
  • 10
  • 103
  • 108
0

I found the solution of this question, we can handle exception in Pyspark similarly like python. eg :

def rename_columnsName(df, columns):#provide names in dictionary format
try:

   if isinstance(columns, dict):
      for old_name, new_name in columns.items():     
    
           df = df.withColumnRenamed(old_name, new_name)
return df.show()
   else:
         raise ValueError("'columns' should be a dict, like {'old_name':'new_name', 
                'old_name_one more':'new_name_1'}")
except Exception as e:
      print(e)
Gamefic
  • 59
  • 8
  • AFAIK, we should not add `Exception` in the `except` clause. If we remove the outer try clause, this function would be easier to read and understand. Thus, we should handle the exceptions when calling this function. What would be the point in raising an error if we don't do that? – DRTorresRuiz May 31 '23 at 13:37