0
DATA Account_cyc_2; 
  set Account_cy; 
  cycle+1; 
  by acc_no; 
  if firsr.acc_no then cycle=0; 
RUN;

windowSpec = Window.partitionBy("acc_no").orderBy("some_column")

df_Account_cyc_2 = (
    df_Account_cyc_1
    .withColumn("cycle", lag("cycle", default=0).over(windowSpec) + 1)
    .withColumn("cycle", 
       when(df_Account_cyc_1["acc_no"] != lag(df_Account_cyc_1["acc_no"], 1)
      .over(windowSpec), 0)
      .otherwise(df_Account_cyc_1["cycle"])
    )
)

but is showing cannot resolve "cycle" given input columns

Richard
  • 25,390
  • 3
  • 25
  • 38
Anil
  • 1
  • 1
  • Please add your input data along with the expected output. – Dipanjan Mallick Apr 04 '23 at 11:55
  • 1
    Are you trying to create an incremental value by account number? If so, this should answer your question: https://stackoverflow.com/questions/45513959/pyspark-get-row-number-for-each-row-in-a-group – s_pike Apr 04 '23 at 13:04
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Apr 04 '23 at 23:34
  • @DipanjanMallick please see below is sas input DATA Account_cyc_2; set Account_cy; cycle+1; by acc_no; if firsr.acc_no then cycle=0; RUN; I want this in pyspark code – Anil Apr 05 '23 at 13:04
  • @s_pike what about if condition – Anil Apr 11 '23 at 10:04
  • I'm not sure how you expect the if condition to work. Can you provide an example of your input data and expected output? – s_pike Apr 17 '23 at 10:40

0 Answers0