0

My code is performing file creation using dataframe rows being put through a function in a loop. When every row has been processed the loop returns None which prevents my entire template from being fully injected with data.

How to stop that loop once all data has been processed to avoid getting None as the last result?

def template_input_col(df,i):

    col = df['ColumnName'].iat[i]
    dt_type = df['SSISType_src'].iat[i]
    dt_lngh = df['DataTypeLength_src'].iat[i]
    dt_prc = df['DataTypePrecision_src'].iat[i]
    dt_scl = df['DataTypeScale_src'].iat[i]

    input_columns = f'''\
    <inputColumn 
        refId="Package\DFT\DST.Inputs[DST Input].Columns[{col}]"
        cachedDataType="{dt_type}"
        cachedName="{col}"
        cachedLength="{dt_lngh}"
        cachedPrecision="{dt_prc}"
        cachedScale="{dt_scl}"
        externalMetadataColumnId="Package\DFT\DST.Inputs[DST Input].ExternalColumns[{col}]"
        lineageId="Package\DFT\SRC.Outputs[SRC Output].Columns[{col}]" />'''

    if dt_type in {'str','wstr','bytes'}:
        input_columns = re.sub(r'\s*cachedPrecision=".*"', '', input_columns)
        input_columns = re.sub(r'\s*cachedScale=".*"', '', input_columns)
    elif dt_type in {'numeric'}:
        input_columns = re.sub(r'\s*cachedLength=".*"', '', input_columns)
    elif dt_type in {'decimal' , 'dbtime2' , 'dbTimeStamp2' , 'dbTimeStampoffset'}:
        input_columns = re.sub(r'\s*cachedLength=".*"', '', input_columns)
        input_columns = re.sub(r'\s*cachedPrecision=".*"', '', input_columns)
    else:
        #input_columns = input_columns
        input_columns = re.sub(r'\s*cachedLength=".*"', '', input_columns)
        input_columns = re.sub(r'\s*cachedPrecision=".*"', '', input_columns)
        input_columns = re.sub(r'\s*cachedScale=".*"', '', input_columns)

    return input_columns

def output_input_col():
    for idx, row in df.iterrows():
        if not pd.isna(row['DataTypeName_src']) and not pd.isna(row['DataTypeName_dst']):
            return template_input_col(df,idx)

print(output_input_col())

Once data is processed it has to injected with below:

line = line.replace('<DST_Input_Columns_Placeholder>',  output_input_col())

Expected result must retain the format of:

<inputColumn
                        refId="Package\DFT\DST.Inputs[DST Input].Columns[created_at]"
                        cachedDataType="dbTimeStamp"
                        cachedName="created_at"
                        externalMetadataColumnId="Package\DFT\DST.Inputs[DST Input].ExternalColumns[created_at]"
                        lineageId="Package\DFT\SRC.Outputs[SRC Output].Columns[created_at]" />
<inputColumn
                        refId="Package\DFT\DST.Inputs[DST Input].Columns[updated_at]"
                        cachedDataType="dbTimeStamp"
                        cachedName="updated_at"
                        externalMetadataColumnId="Package\DFT\DST.Inputs[DST Input].ExternalColumns[updated_at]"
                        lineageId="Package\DFT\SRC.Outputs[SRC Output].Columns[updated_at]" />
<inputColumn
                        refId="Package\DFT\DST.Inputs[DST Input].Columns[deleted_at]"
                        cachedDataType="dbTimeStamp"
                        cachedName="deleted_at"
                        externalMetadataColumnId="Package\DFT\DST.Inputs[DST Input].ExternalColumns[deleted_at]"
                        lineageId="Package\DFT\SRC.Outputs[SRC Output].Columns[deleted_at]" />
martineau
  • 119,623
  • 25
  • 170
  • 301
marcin2x4
  • 1,321
  • 2
  • 18
  • 44
  • There's no `return` statement in `output_input_col()`. What are you expecting to print? – Barmar Sep 23 '21 at 23:27
  • There may be some confusion. Your function isn't returning ```None``` because of what may or may not be the last value used in the ```for``` loop. It returns ```None``` because it has no explicit ```return``` statement. – sj95126 Sep 23 '21 at 23:28
  • 1
    `template_input_col` returns a string, but you're not doing anything with it in your `for` loop. So all this code really does nothing. – Barmar Sep 23 '21 at 23:29
  • `return` does the job but I'm getting only last set of data injected even though there are 1+ rows in the dataframe – marcin2x4 Sep 23 '21 at 23:31
  • 1
    `return` ends the function, so it stops the loop. – Barmar Sep 23 '21 at 23:31

1 Answers1

1

If the if condition is never true for any rows of the dataframe, the function never executes the return statement, so it will return None.

When the condition is true, the function stops at that row of the dataframe, since return ends the function.

If you want a list of all the templates, append them to a list, and return that after the loop.

def output_input_col():
    results = []
    for idx, row in df.iterrows():
        if not pd.isna(row['DataTypeName_src']) and not pd.isna(row['DataTypeName_dst']):
            results.append(template_input_col(df,idx))
    return results
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks Barmar! This will work but one thing I need to retain is the structure of each block that is created while processing the template. Creating a list which I can convert back to string crashes my established formatting. I updated my case with what is happening later. – marcin2x4 Sep 23 '21 at 23:40
  • I don't understand how you expect that to work. What do you expect `output_input_col()` to return so you can use it in `replace()`? The second argument must be a string. – Barmar Sep 23 '21 at 23:44
  • I played around and `"\n".join(output_input_col())` returns the desired result :) `output_input_col()` was to return n-templates filled with data from dataframe per row in a specified format. – marcin2x4 Sep 23 '21 at 23:46