After looking for answers and trying everything could not figure out a way out, so here it goes. I have a list of *.txt files that I want to merge by column. I am 100% sure that they have the same structure, as follows
File1
date | time | model_name1
1850-01-16 | 12:00:00 | 0.10
File2
date | time | model_name2
1850-01-16 | 12:00:00 | 0.50
File3..... and so on
Note: the vertical bars are just for clarity here.
Now my output should look like this:
Output
date | time | model_name1 | model_name2
1850-01-16 | 12:00:00 | 0.10 | 0.50
With the following piece of code
out_list4 = os.listdir(out_directory)
df_list = [pd.read_table(out_path+os.fsdecode(file_x), sep='\s+') for file_x in out_list4]
df_merged = reduce(lambda left,right: ,
pd.merge(left,right,on=['date'], how='outer'), df_list)
pd.DataFrame.to_csv(df_merged, out_path+'merged.txt', sep='\t', index=False)
I manage the following output:
Output
date | time_x | model_name1 |time_y | model_name2
1850-01-16 | 12:00:00 | 0.10 |12:00:00| 0.50
As expected since I only have the key ""on=['date']"".
Now if I try to write time as second key as follows: ""on=['date','time']"", it crashes with the following error:
Key error:'time'
and a long list of tracebacks.
I tried placing left_on/righ_on in case "date" was being handled as index. No use. I know the problem does not lie on the files, the structure is right, it is the code. Any help will be much appreciated. And sorry for readibility on the