0

I have been trying to apply the toPandas() function to a file that is 5GB in size and I keep getting a connection refused error.

ConnectionRefusedError                    Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/py4j/clientserver.py in connect_to_java_server(self)
    436                 self.socket = self.ssl_context.wrap_socket(
    437                     self.socket, server_hostname=self.java_address)
--> 438             self.socket.connect((self.java_address, self.java_port))
    439             self.stream = self.socket.makefile("rb")
    440             self.is_connected = True

ConnectionRefusedError: [Errno 111] Connection refused

I have tested the function for files upto a few 100 MB and it seems to work. Is there no workaround for using the toPandas() function for files of larger sizes?

user460567
  • 133
  • 9
  • 1
    `toPandas()` brings all the data to a single node - the driver node - and this node should be able to handle the data size. [a good read](https://stackoverflow.com/q/47536123/8279585) – samkart Sep 21 '22 at 05:38

0 Answers0