Merge in Panda does not allow second key to join on

Question

After looking for answers and trying everything could not figure out a way out, so here it goes. I have a list of *.txt files that I want to merge by column. I am 100% sure that they have the same structure, as follows

File1
date       | time     | model_name1
1850-01-16 | 12:00:00 | 0.10

File2
date       | time     | model_name2
1850-01-16 | 12:00:00 | 0.50

File3..... and so on

Note: the vertical bars are just for clarity here.

Now my output should look like this:

Output
date       | time     | model_name1 | model_name2
1850-01-16 | 12:00:00 | 0.10        | 0.50

With the following piece of code

out_list4 = os.listdir(out_directory)
df_list = [pd.read_table(out_path+os.fsdecode(file_x), sep='\s+') for file_x in out_list4]

df_merged = reduce(lambda  left,right: ,
                   pd.merge(left,right,on=['date'], how='outer'), df_list)

pd.DataFrame.to_csv(df_merged, out_path+'merged.txt', sep='\t', index=False)

I manage the following output:

Output
date       | time_x     | model_name1 |time_y  | model_name2
1850-01-16 | 12:00:00   |   0.10      |12:00:00| 0.50

As expected since I only have the key ""on=['date']"".

Now if I try to write time as second key as follows: ""on=['date','time']"", it crashes with the following error:

Key error:'time'

and a long list of tracebacks.

I tried placing left_on/righ_on in case "date" was being handled as index. No use. I know the problem does not lie on the files, the structure is right, it is the code. Any help will be much appreciated. And sorry for readibility on the

Seems like one of your files doesn't have the 'time' as a column heading. Or you might need to strip whitespace around your column headings. Maybe you have ' time ' or something like that. — Scott Boston, Oct 18 '17 at 15:01
Update: the problem is with the df_list. I have created it again reading each file once at a time and then just passing them to create the list df_list = [df1, df2....and so on] and it works like a charm. Now I just have to discover what is happening with the df_list oneliner. — RIAF, Oct 19 '17 at 07:28

RIAF · Answer 1 · 2017-10-19T08:21:02.340

So, the problem was before. I had defined ""out_list4"" as a list before:

out_list4 = list()

and it was making a mess at the end. Each data element on the list should have size 1872 x 3, but at the end it was adding them altogether again making one last entry be 1872 x 12 and no 'time' header. Changing the definition of ""out_list4"" to:

out_list4 = []

did the trick. The tip came from Combine a list of pandas dataframes to one pandas dataframe.

Merge in Panda does not allow second key to join on

1 Answers1