I am trying to load multiple files in a single load. They are all partitioned files When I tried it with 1 file it works, but when I listed down 24 files, it gives me this error and I could not find any documentation of the limitation and a workaround aside from doing the union after the load. Is there any alternatives?
CODE Below to re-create the problem:
basepath = '/file/'
paths = ['/file/df201601.orc', '/file/df201602.orc', '/file/df201603.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc',
'/file/df201604.orc', '/file/df201605.orc', '/file/df201606.orc', ]
df = sqlContext.read.format('orc') \
options(header='true',inferschema='true',basePath=basePath)\
.load(*paths)
ERROR RECEIVED :
TypeError Traceback (most recent call last)
<ipython-input-43-7fb8fade5e19> in <module>()
---> 37 df = sqlContext.read.format('orc') .options(header='true', inferschema='true',basePath=basePath) .load(*paths)
38
TypeError: load() takes at most 4 arguments (24 given)