I was reading through this post, https://nycdatascience.com/blog/student-works/yelp-recommender-part-2/, and followed basically everything they showed. However, after reading this post, Spark 2.1 Structured Streaming - Using Kakfa as source with Python (pyspark), when I run
SPARK_HOME/bin/spark-submit read_stream_spark.py --master local[4] --jars spark-sql-kafka-0.10_2.11-2.1.0.jar
I still get the error that 'Failed to find data source: kafka'.
I also read through this. https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html. The official doc ask for two hosts and two ports while I only use one. Should I specify another host and port other than cloud server and the kafka port? Thanks.
Could you please let me know what I am missing. Or I shouldn't have run the script alone?