0

My timestamp in real data would be like this or as shown below

2018-02-28T00:05:20.3717898Z 
2018-02-28T00:05:23.6589778Z 
2018-02-28T00:05:23.9119922Z 
2018-02-28T00:05:25.4230787Z 
2018-02-28T00:05:25.6710929Z 
2018-02-28T00:05:26.4271361Z 

And I use this code to read the data

userSchema=StructType().add('time','timestamp')
s=spark.readStream.schema(userSchema).csv('xxxx')

The result is like this

Complete no idea how it happened.

user238607
  • 1,580
  • 3
  • 13
  • 18
ellie
  • 1
  • 1
  • I think spark might be reading it in the correct format. What could be happening is that it is showing you the truncated form. Try to use s.show(10, truncate=false). Here is a question you with exactly the same problem as yours : https://stackoverflow.com/questions/33742895/how-to-show-full-column-content-in-a-spark-dataframe – user238607 Nov 16 '18 at 07:17
  • Thanks, your answer is very heuristic. But the streaming object doesn't support shown() function. I tried to modify the timestamp format when read data and use option("truncate", False) for writestream(), the results look much better. – ellie Nov 16 '18 at 15:27

0 Answers0