I understand that null value NaN in pandas are float and that is the reason why in column with NaN, integer are converted to float.
But I have a script that merge multiple sources of data to produce an xpt file. My issue is that xpt files formats are defined as SDTM international standards.
Laboratory results can be integer, float and event text (positive,negative). In SDTM, I have 2 result columns (LBORRES and LBSTRESN), one in char (LBORRES) and the other in numeric (LBSTRESN).
The char column must store integer (without decimal), float and string. is it possible ?
I have no problem with the numerix column. But with the cha column, integer are converted to float.
df = pd.DataFrame(columns = sdtm_variables)
# reference table = randomisation
df_randomisation = pd.read_csv(f'./csv/source/randomisation.csv',delimiter=",").fillna('NULL')
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='redcap_repeat_instance'] # exclude column redcap_repeat_instance as randomisation form are unique
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='redcap_event_name'] # exclude column redcap_repeat_name as randomisation form are unique
df_randomisation = df_randomisation.loc[:, df_randomisation.columns!='ID'] # exclude column redcap_repeat_name as randomisation form are unique
df = pd.merge(df,df_randomisation, left_on='pat_ide', right_on='pat_ide', how='outer').fillna('NULL')
# 1. raw data retrieved
df_laboratory = pd.read_csv(f'./csv/source/biologie_labo.csv',delimiter=",").fillna('NULL')
# df_laboratory = df_laboratory[['pat_ide','redcap_event_name','redcap_repeat_instance','ID']]
df = pd.merge(df,df_laboratory, left_on='pat_ide', right_on='pat_ide', how='outer').sort_values(by=['pat_ide','lab_dat']).fillna('NULL').sort_values(by=['pat_ide','lab_dat'])
# creation of empty dataframe where will be add lines for each diseases for a patient
tmp_df_laboratory = pd.DataFrame(columns = sdtm_variables)
tmp_df_laboratory['ran_trt'] = None
# df = df[(df['pat_ide'] == 'BFBO001')]
# print(df)
# list of biological analysis
labos = pd.read_excel('LABO.xls',sheet_name='EXAMS')
for index, row in df.iterrows(): # .convert_dtypes() prevent from integer to become float when missing values (NaN is a float so the column is converted to float) => do not works
for k,v in labos.iterrows():
if row['ID'] != 'NULL':
# Biological analysis with unit select by user
# and not corresponding to: Coagulation (Taux de prothrombine, INR) et analyses sérologique Hépatite et HIV
# if f"{v['VAR']}"[-4:] != '_uni' and f"{v['VAR']}" not in ['lab_pro','lab_inr','lab_hbs','lab_hcv','lab_vih','lab_hcg','lab_gro']:
# if f"{v['VAR']}" in ['lab_pro','lab_inr','lab_hbs','lab_hcv','lab_vih','lab_hcg','lab_gro']:
if f"{v['CATEGORY']}" == 'NFS':
tmp_df_laboratory = tmp_df_laboratory.append({
...
'LBORRES' : str(row[f"{v['VAR']}"]) if row[f"{v['VAR']}"] != 'NULL' else '',
'LBSTRESN' : row[f"{v['VAR']}"] if row[f"{v['VAR']}"] != 'NULL' else np.nan,
},ignore_index=True)
expected output (note that the last line is empty)