I have a spark job that works perfectly. But I wanted to parallelize the search and operations on two df, before doing the join between the two. I decided to use Threads, but after start and join the spark functions no longer work.
T_df_1 = ThreadWithReturnValue(target=thread_df_1, args = (arg1, arg2),)
T_df_2 = ThreadWithReturnValue(target=thread_df_2, args= (arg3, arg4, args),)
T_df_1.start()
T_df_2.start()
df_1 = T_df_1.join()
df_2 = T_df_2.join()
print(f'df_1 dim = {[df_1.count(),len(df_1.columns)]}')
Error message
ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/xxx.fy", line 464, in <module>
print(f'df_1 dim = {[df_1.count(),len(df_1.columns)]}')
AttributeError: 'NoneType' object has no attribute 'count'