0

May I know what's wrong with below code? it does not print anything.

from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date, current_timestamp,last_day,next_day, date_format, date_add, year, month, dayofmonth, dayofyear, dayofweek, date_trunc, date_sub, to_date, add_months, weekofyear, quarter, col
from pyspark.sql.types import StructType,StructField,StringType, IntegerType
ss = SparkSession.builder.appName('DateDim').master('local[1]').getOrCreate()
df = ss.createDataFrame([],StructType([]))
current_date()
df = df.select(current_date().alias("current_date"),next_day(current_date(), 'sunday').alias("next_day"),dayofweek(current_date()).alias("day_of_week"),dayofmonth(current_date()).alias("day_of_month"),dayofyear(current_date()).alias("day_of_year"),last_day(current_date()).alias("last_day"),year(current_date()).alias("year"),month(current_date()).alias("month"), weekofyear(current_date()).alias("week_of_year"),quarter(current_date()).alias("quarter")).collect()
print(df)
for i in range(1, 1000):
    print(i)
for i in range(1, 1000):
    v_date = date_add(v_date, i)
    df.unionAll(df.select(v_date.alias("current_date"),next_day(v_date,'sunday').alias("next_day"),dayofweek(v_date).alias("day_of_week"),dayofmonth(v_date).alias("day_of_month"),dayofyear(v_date).alias("day_of_year"),last_day(v_date).alias("last_day"),year(v_date).alias("year"),month(v_date).alias("month"), weekofyear(v_date).alias("week_of_year"),quarter(v_date).alias("quarter")))
df.show()
  • it's because there are no rows in the initial `df`. creating new columns will only create columns with no values as there are no rows in the `df` being used – samkart Aug 27 '22 at 11:05
  • Does this answer your question? [SparkSQL on pyspark: how to generate time series?](https://stackoverflow.com/questions/43141671/sparksql-on-pyspark-how-to-generate-time-series) – ZygD Sep 22 '22 at 11:08

2 Answers2

0

You're getting zero rows because there are no rows in the initial df. Any columns being created will have no values as there are no rows in df.

It seems you're trying to create a dataframe with 1000 dates starting from the current day. There's a simple approach using sequence function.

data_sdf = spark.createDataFrame([(1,)], 'id string')

data_sdf. \
    withColumn('min_dt', func.current_date().cast('date')). \
    withColumn('max_dt', func.date_add('min_dt', 1000).cast('date')). \
    withColumn('all_dates', func.expr('sequence(min_dt, max_dt, interval 1 day)')). \
    withColumn('dates_exp', func.explode('all_dates')). \
    drop('id'). \
    show(10)

# +----------+----------+--------------------+----------+
# |    min_dt|    max_dt|           all_dates| dates_exp|
# +----------+----------+--------------------+----------+
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-27|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-28|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-29|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-30|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-31|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-01|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-02|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-03|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-04|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-05|
# +----------+----------+--------------------+----------+
# only showing top 10 rows

select the dates_exp field for further use.

samkart
  • 6,007
  • 2
  • 14
  • 29
0

You want to use the range() function which generates rows (using sequence you will generate an array which you then need to explode into rows).

That's how you can use it:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, next_day, dayofweek, dayofmonth, dayofyear, last_day, year, month, \
    weekofyear, quarter, current_date

spark = SparkSession.builder.getOrCreate()

(
    spark
    .range(0, 1000)
    .alias("id")
    .select(
        (current_date() + col('id').cast("int")).alias("date")
    )
    .select(
        "date",
        next_day("date", 'sunday').alias("next_sunday"),
        dayofweek("date").alias("day_of_week"),
        dayofmonth("date").alias("day_of_month"),
        dayofyear("date").alias("day_of_year"),
        last_day("date").alias("last_day"),
        year("date").alias("year"),
        month("date").alias("month"),
        weekofyear("date").alias("week_of_year"),
        quarter("date").alias("quarter")
    )
).show()

it returns

+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
|      date|next_sunday|day_of_week|day_of_month|day_of_year|  last_day|year|month|week_of_year|quarter|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
|2022-09-22| 2022-09-25|          5|          22|        265|2022-09-30|2022|    9|          38|      3|
|2022-09-23| 2022-09-25|          6|          23|        266|2022-09-30|2022|    9|          38|      3|
|2022-09-24| 2022-09-25|          7|          24|        267|2022-09-30|2022|    9|          38|      3|
|2022-09-25| 2022-10-02|          1|          25|        268|2022-09-30|2022|    9|          38|      3|
|2022-09-26| 2022-10-02|          2|          26|        269|2022-09-30|2022|    9|          39|      3|
|2022-09-27| 2022-10-02|          3|          27|        270|2022-09-30|2022|    9|          39|      3|
|2022-09-28| 2022-10-02|          4|          28|        271|2022-09-30|2022|    9|          39|      3|
|2022-09-29| 2022-10-02|          5|          29|        272|2022-09-30|2022|    9|          39|      3|
|2022-09-30| 2022-10-02|          6|          30|        273|2022-09-30|2022|    9|          39|      3|
|2022-10-01| 2022-10-02|          7|           1|        274|2022-10-31|2022|   10|          39|      4|
|2022-10-02| 2022-10-09|          1|           2|        275|2022-10-31|2022|   10|          39|      4|
|2022-10-03| 2022-10-09|          2|           3|        276|2022-10-31|2022|   10|          40|      4|
|2022-10-04| 2022-10-09|          3|           4|        277|2022-10-31|2022|   10|          40|      4|
|2022-10-05| 2022-10-09|          4|           5|        278|2022-10-31|2022|   10|          40|      4|
|2022-10-06| 2022-10-09|          5|           6|        279|2022-10-31|2022|   10|          40|      4|
|2022-10-07| 2022-10-09|          6|           7|        280|2022-10-31|2022|   10|          40|      4|
|2022-10-08| 2022-10-09|          7|           8|        281|2022-10-31|2022|   10|          40|      4|
|2022-10-09| 2022-10-16|          1|           9|        282|2022-10-31|2022|   10|          40|      4|
|2022-10-10| 2022-10-16|          2|          10|        283|2022-10-31|2022|   10|          41|      4|
|2022-10-11| 2022-10-16|          3|          11|        284|2022-10-31|2022|   10|          41|      4|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
only showing top 20 rows
botchniaque
  • 4,698
  • 3
  • 35
  • 63