In pyspark sqlcontext sql, have written code to get text and then reformat it But something like this is the issue
Having something in the dataframe like this where the code is like
hash_tags_fun = udf(lambda t: re.findall('(#[^#]\w{3,})', t))
hash_tags_in_tweets_df.registerTempTable("hash_tags_table")
hash_tags_result = sqlContext.sql("SELECT text FROM hash_tags_table")
hash_tags_list = hash_tags_result.withColumn('text', hash_tags_fun('text'))
hash_tags_list.show(3)
+-------------------+
| text|
+-------------------+
| [#shutUpAndDANCE]|
| [#SHINee, #AMBER]|
|[#JR50, #flipagram]|
+-------------------+
I need something like
+-------------------+
| text|
+-------------------+
| #shutUpAndDANCE|
| #SHINee|
| #AMBER|
| #JR50|
| #flipagram|
+-------------------+
hash_tags_list.withColumn("text", explode("text")) has given an error saying
AnalysisException: u"cannot resolve 'explode(
text
)' due to data type mismatch: input to function explode should be array or map type, not string;; \n'Project [explode(text#24) AS text#68]\n+- AnalysisBarrier\n
+- Project [(text#9) AS text#24]\n
+- Project [text#9]\n
+- SubqueryAlias hash_tags_table\n
+- Project [text#9]\n
+- Filter text#9 LIKE %#%\n
+- SubqueryAlias twt\n
+- SubqueryAlias tweets\n
+- Relation[country#6,id#7,place#8,text#9,user#10] json\n"