0

I'm getting a ResourceWarning in every unit test I run on Spark like this:

    /opt/conda/lib/python3.9/socket.py:775: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 37512), raddr=('127.0.0.1', 38975)>
  self._sock = None
    ResourceWarning: Enable tracemalloc to get the object allocation traceback

I tracked it down to DataFrame.toPandas(). Example:

import unittest    
from pyspark.sql import SparkSession

class PySparkTestCase(unittest.TestCase):

    def test_convert_to_pandas_df(self):
        spark = SparkSession.builder.master("local[2]").getOrCreate()
        rawData = spark.range(10)
        print("XXX 1")
        pdfData = rawData.toPandas()
        print("XXX 2")
        print(pdfData)

if __name__ == '__main__':
    unittest.main(verbosity=2)

You'll see the 2 ResourceWarnings just before the XXX 2 output line.

However, if you run the same code outside unittest, you won't get the resource warning!

from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local[2]").getOrCreate()    
rawData = spark.range(10)    
print("XXX 1")
pdfData = rawData.toPandas()
print("XXX 2")
print(pdfData)

So, is unittest doing something to cause this resource warning in toPandas()? I appreciate I could hide the resource warning (e.g., see here or here), but I'd rather not get the resource warning in the first place!

snark
  • 2,462
  • 3
  • 32
  • 63

1 Answers1

0

You can set a environment variable called PYTHONWARNINGS to the value ignore before you run your tests or use the -W ignore switch the python interpreter has.

Clasherkasten
  • 488
  • 3
  • 9
  • Thanks; I know I could suppress the warning but I want to know what's causing it in the first place. I might ignore it until a fix is found. Is there a bug in `toPandas()` or unittest for example? – snark Dec 21 '22 at 12:24
  • E.g., I can suppress both ResourceWarnings just by adding this to the `setUp()` method of my unit tests: `warnings.filterwarnings("ignore", category=ResourceWarning, message="unclosed – snark Dec 21 '22 at 12:33
  • 1
    Using your example code (not the unittest) and running it through `python -W once` gives me the same resource warning. So my guess it, that this is a problem with pyspark, not with unittest – Clasherkasten Dec 21 '22 at 12:33
  • Update to my previous comment: moving the warnings filter to `setUpClass()` also works, and should be more efficient than running it before every single unit test via `setUp()`. – snark Jan 03 '23 at 16:03