I have a schema
schema = StructType([
StructField('title', StringType(), True),
StructField('author', ArrayType(StringType()), True),
StructField('year', IntegerType(), True),
StructField('url', StringType(), True)])
article = sqlContext.read.format('com.databricks.spark.xml') \
.options(rowTag='article', excludeAttribute=True, charset='utf-8') \
.load('source.xml', schema=schema)
where author
contains several names of authors.
I can filter the name inside author
by array_contains
like:
name = 'Tom Cat'
article.filter(array_contains(article.author, name)).show()
However, I wonder if there's a way for me that I can filter a name ignoring cases like:
name = 'tom cat'
article.filter(array_contains(article.author, name, CASE_INSENSITIVE)).show()
such that I can get the same result as the previous sentence.