I am trying to parse multiple word doc files in apache spark when I run the script via spark submit lets say a word count as example it gives me an error as follows: unicodeencodeerror 'ascii' codec can't encode character u' ufffd' ordinal not in range 128.
Can we parse microsoft word documents in spark? Else is there any workaround for the same.
Thanks.