I have a Data set like below:
file : test.txt
149|898|20180405
135|379|20180428
135|381|20180406
31|898|20180429
31|245|20180430
135|398|20180422
31|448|20180420
31|338|20180421
I have created data frame by executing below code.
spark = SparkSession.builder.appName("test").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
df_transac = spark.createDataFrame(sc.textFile("test.txt")\
.map(lambda x: x.split("|")[:3])\
.map(lambda r: Row('cCode'= r[0],'pCode'= r[1],'mDate' = r[2])))
df_transac .show()
+-----+-----+----------+
|cCode|pCode| mDate|
+-----+-----+----------+
| 149| 898| 20180405 |
| 135| 379| 20180428 |
| 135| 381| 20180406 |
| 31| 898| 20180429 |
| 31| 245| 20180430 |
| 135| 398| 20180422 |
| 31| 448| 20180420 |
| 31| 338| 20180421 |
+-----+-----+----------+
my df.printSchemashow like below:
df_transac.printSchema()
root
|-- customerCode: string (nullable = true)
|-- productCode: string (nullable = true)
|-- quantity: string (nullable = true)
|-- date: string (nullable = true)
but I want to create a data frame based my input dates i.e date1="20180425" date2="20180501"
my expected output is:
+-----+-----+----------+
|cCode|pCode| mDate|
+-----+-----+----------+
| 135| 379| 20180428 |
| 31| 898| 20180429 |
| 31| 245| 20180430 |
+-----+-----+----------+
please help on this how can I achieve this.