I'm trying to run unit tests on my pyspark scripts locally so that I can integrate this into our CI.
$ pyspark
...
>>> import pandas as pd
>>> df = pd.DataFrame([(1,2,3), (4,5,6)])
>>> df
0 1 2
0 1 2 3
1 4 5 6
As per the documentation, I should be able to convert using the following:
from awsglue.dynamicframe import DynamicFrame
dynamic_frame = DynamicFrame.fromDF(dataframe, glue_ctx, name)
But when I try to convert to a DynamicFrame I get errors when trying to instantiate the gluecontext
$ pyspark
>>> from awsglue.context import GlueContext
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> glueContext = GlueContext(sc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/awsglue/context.py", line 43, in __init__
self._glue_scala_context = self._get_glue_scala_context(**options)
File "/Library/Python/2.7/site-packages/awsglue/context.py", line 63, in _get_glue_scala_context
return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable
How do I get this working WITHOUT using AWS Glue Dev Endpoints? I don't want to be charged EVERY TIME I commit my code. that's absurd.