I am working with Spark and HBase (using HappyBase library) and everything goes ok when working with small datasets. But, when working with big datasets, the connection to HBase Thrift is lost after many calls to map function. I am working with a single pseudonode at this moment.
Concretly, the following error takes place at the map function:
TTransportException: Could not connect to localhost:9090
Map function:
def save_triples(triple, ac, table_name, ac_vertex_id, graph_table_name):
connection = happybase.Connection(HBASE_SERVER_IP, compat='0.94')
table = connection.table(table_name)
[...]
connection.close()
This is the call to map function:
counts = lines.map(lambda x: save_triples(x, ac, table_name, ac_vertex_id, graph_table_name))
output = counts.collect()
I suspect that it is happening because many connections are being opened. I have tried to create the 'connection' object in main function and passing it to the map function as a parameter (something like this works with HBase libraries in Java), but I get following error:
pickle.PicklingError: Can't pickle builtin <type 'method_descriptor'>
Any help would be appreciated.