Spark 2.2 Illegal pattern component: XXX java.lang.IllegalArgumentException: Illegal pattern component: XXX

Question

I'm trying to upgrade from Spark 2.1 to 2.2. When I try to read or write a dataframe to a location (CSV or JSON) I am receiving this error:

Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:81)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:43)
at org.apache.spark.sql.execution.datasources.json.JsonFileFormat.inferSchema(JsonFileFormat.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:176)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:333)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:279)

I am not setting a default value for dateFormat, so I'm not understanding where it is coming from.

spark.createDataFrame(objects.map((o) => MyObject(t.source, t.table, o.partition, o.offset, d)))
    .coalesce(1)
    .write
    .mode(SaveMode.Append)
    .partitionBy("source", "table")
    .json(path)

I still get the error with this:

import org.apache.spark.sql.{SaveMode, SparkSession}
val spark = SparkSession.builder.appName("Spark2.2Test").master("local").getOrCreate()
import spark.implicits._
val agesRows = List(Person("alice", 35), Person("bob", 10), Person("jill", 24))
val df = spark.createDataFrame(agesRows).toDF();

df.printSchema
df.show

df.write.mode(SaveMode.Overwrite).csv("my.csv")

Here is the schema:

root
 |-- name: string (nullable = true)
 |-- age: long (nullable = false)

I don't see anything wrong with your code. can you please share MyObject class definition? Try to convert object manually to json then try to save as string — Rahul Sharma, Sep 26 '17 at 15:05
case class MyObject(source: String, table: String, partition: Int, offset: Long, updatedOn: String) — Lee, Sep 26 '17 at 17:17
Read and write date fields as String. Operate on date field manually using SimpleDateFormat — Rahul Sharma, Sep 26 '17 at 18:31

score 36 · Accepted Answer · edited Jun 28 '18 at 06:41

36

I found the answer.

The default for the timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXX which is an illegal argument. It needs to be set when you are writing the dataframe out.

The fix is to change that to ZZ which will include the timezone.

df.write
.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ")
.mode(SaveMode.Overwrite)
.csv("my.csv")

edited Jun 28 '18 at 06:41

Shaido

27,497
23
70
73

answered Sep 26 '17 at 19:12

Lee

658
1
6
13

4

Also if you're trying to read a file: `df = spark.read.option('timestampFormat', 'yyyy/MM/dd HH:mm:ss ZZ').json(PATH_TO_FILE)` – William Luxion Jan 22 '18 at 05:58
Correct, this only happens for CSV and JSON. – Lee Mar 14 '18 at 18:14
1

Oddly ... I have no timestamps in my output. There is a timestamp column in the prior-to-filtering stages, but still this option was required to avoid the stackdump. – codeaperature Oct 25 '18 at 20:47

score 26 · Answer 2 · answered Jan 24 '18 at 08:22

26

Ensure you are using the correct version of commons-lang3

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-lang3</artifactId>
  <version>3.5</version>
</dependency>

answered Jan 24 '18 at 08:22

Mauro Pirrone

391
3
5

2

why commons-lang3 have something to do here ? – Haha TTpro Apr 18 '18 at 03:09
I'm also interested by an explanation – Romibuzi Sep 18 '18 at 12:24
7

In CDH, hive-exec-1.1.0-cdh5.15.1.jar also has the class "FastDateFormat" which is not supporting the default format "yyyy-MM-dd'T'HH:mm:ss.SSSXXX" of org.apache.spark.sql.catalyst.json.JSONOptions. So ensure commons-lang3.3.5 jar is in your classpath. In SBT add dependency with compile option. "org.apache.commons" % "commons-lang3" % "3.5" % "compile" – Nagaraj Vittal Apr 29 '19 at 09:18

score 4 · Answer 3 · answered Feb 22 '19 at 00:17

Use commons-lang3-3.5.jar fixed the original error. I didn't check the source code to tell why but it is no surprising as the original exception happens at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282). I also noticed the file /usr/lib/spark/jars/commons-lang3-3.5.jar (on an EMR cluster instance) which also suggest 3.5 is the consistent version to use.

score -2 · Answer 4 · answered Oct 26 '19 at 07:33

-2

I also met this problem, and my solution(reason) is: Because I put a wrong format json file to hdfs. After I put a correct text or json file, it can go correctly.

answered Oct 26 '19 at 07:33

Zhang Xujie

13
4

There were no timestamps in the file, just epochs, which are longs. Thanks for the comment. – Lee Oct 28 '19 at 14:15

Spark 2.2 Illegal pattern component: XXX java.lang.IllegalArgumentException: Illegal pattern component: XXX

4 Answers4

Linked

Related