0

I understand that null value NaN in pandas are float and that is the reason why in column with NaN, integer are converted to float.

But I have a script that merge multiple sources of data to produce an xpt file. My issue is that xpt files formats are defined as SDTM international standards.

Laboratory results can be integer, float and event text (positive,negative). In SDTM, I have 2 result columns (LBORRES and LBSTRESN), one in char (LBORRES) and the other in numeric (LBSTRESN).

The char column must store integer (without decimal), float and string. is it possible ?

I have no problem with the numerix column. But with the cha column, integer are converted to float.

df = pd.DataFrame(columns = sdtm_variables)

# reference table = randomisation
df_randomisation = pd.read_csv(f'./csv/source/randomisation.csv',delimiter=",").fillna('NULL')
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='redcap_repeat_instance'] # exclude column redcap_repeat_instance as randomisation form are unique
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='redcap_event_name'] # exclude column redcap_repeat_name as randomisation form are unique
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='ID'] # exclude column redcap_repeat_name as randomisation form are unique
df = pd.merge(df,df_randomisation, left_on='pat_ide', right_on='pat_ide', how='outer').fillna('NULL')


# 1. raw data retrieved

df_laboratory = pd.read_csv(f'./csv/source/biologie_labo.csv',delimiter=",").fillna('NULL')
# df_laboratory = df_laboratory[['pat_ide','redcap_event_name','redcap_repeat_instance','ID']]
df = pd.merge(df,df_laboratory, left_on='pat_ide', right_on='pat_ide', how='outer').sort_values(by=['pat_ide','lab_dat']).fillna('NULL').sort_values(by=['pat_ide','lab_dat']) 

# creation of empty dataframe where will be add lines for each diseases for a patient
tmp_df_laboratory = pd.DataFrame(columns = sdtm_variables)
tmp_df_laboratory['ran_trt'] = None


# df = df[(df['pat_ide'] == 'BFBO001')]
# print(df)

# list of biological analysis
labos = pd.read_excel('LABO.xls',sheet_name='EXAMS')

for index, row in df.iterrows(): # .convert_dtypes() prevent from integer to become float when missing values (NaN is a float so the column is converted to float) => do not works
    for k,v in labos.iterrows(): 
        if row['ID'] != 'NULL':
            # Biological analysis with unit select by user
            # and not corresponding to: Coagulation (Taux de prothrombine, INR) et analyses sérologique Hépatite et HIV
            # if f"{v['VAR']}"[-4:] != '_uni' and f"{v['VAR']}" not in ['lab_pro','lab_inr','lab_hbs','lab_hcv','lab_vih','lab_hcg','lab_gro']:   
            # if f"{v['VAR']}" in ['lab_pro','lab_inr','lab_hbs','lab_hcv','lab_vih','lab_hcg','lab_gro']: 
            if f"{v['CATEGORY']}" == 'NFS': 
                tmp_df_laboratory = tmp_df_laboratory.append({
                    ...
                    'LBORRES' : str(row[f"{v['VAR']}"]) if row[f"{v['VAR']}"] != 'NULL' else '',
                    'LBSTRESN' : row[f"{v['VAR']}"] if row[f"{v['VAR']}"] != 'NULL' else np.nan,
                },ignore_index=True) 
            

enter image description here

expected output (note that the last line is empty)

enter image description here

Mereva
  • 350
  • 2
  • 14
  • What is your expected output? – Mayank Porwal Oct 14 '21 at 15:07
  • In pandas 0.25 integer nan types were introduced with extension types. See [the docs](https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support), you can convert your column with `column=column.astype('Int64')` (note the quotes around the dtype) – G. Anderson Oct 14 '21 at 15:23
  • Does this answer your question? [NumPy or Pandas: Keeping array type as integer while having a NaN value](https://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value) – G. Anderson Oct 14 '21 at 15:23
  • I do not really understand how to convert my column. I define a Dataframe tmp_df_laboratory and try something like ```tmp_df_laboratory['LBORRES'].astype('Int64')``` that doesn't works – Mereva Oct 14 '21 at 15:47

0 Answers0