-1

I needed to create a new column(FILE_DT)and apply the constant values to all the rows after reading this csv file as a PySpark dataframe.

For example: Sample dataframe

constant values: 2022-10-01

NAME   INFO   TITLE   FILE_DT
AAA    222     BBB    2022-10-01
ACC    111     CCB    2022-10-01  
ADD    333     DDC    2022-10-01
ASS    444     NNC    2022-10-01
Anos
  • 57
  • 8
  • 1
    something like: `df.withColumn("FILE_DT", F.to_date(F.regexp_extract(F.input_file_name(), r"ORDERS_E.*_D(\d+)$", 1), "yyMMdd"))` – blackbishop Oct 11 '22 at 16:07
  • @blackbishop Thanks for your input! Could you pls tell what is F? – Anos Oct 11 '22 at 16:10
  • Does this answer your question? [How to add a constant column in a Spark DataFrame?](https://stackoverflow.com/questions/32788322/how-to-add-a-constant-column-in-a-spark-dataframe) – werner Oct 11 '22 at 16:28

2 Answers2

0

I tried the below code, It is working but looking for better logic.

from datetime import datetime
object_name = "ORDERS_E220928_D220928.csv"
current_day = datetime.today().strftime("%Y%m%d")
filedate= current_day[0:4] + object_name[18:22]
print(filedate)    #20221011

import datetime
d = datetime.datetime.strptime(filedate, "%Y%m%d")
s = d.strftime('%Y-%m-%d')
print(s)
df.withColumn("file_dt", lit(s)).show()
Anos
  • 57
  • 8
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 16 '22 at 08:02
0

the simplest way

import pyspark.sql.functions as F

df_with_date = df.withColumn("FILE_DT",F.lit("2022-10-01").cast("date"))

MOK
  • 58
  • 1
  • 6