1

After looking for answers and trying everything could not figure out a way out, so here it goes. I have a list of *.txt files that I want to merge by column. I am 100% sure that they have the same structure, as follows

File1
date       | time     | model_name1
1850-01-16 | 12:00:00 | 0.10

File2
date       | time     | model_name2
1850-01-16 | 12:00:00 | 0.50

File3..... and so on

Note: the vertical bars are just for clarity here.

Now my output should look like this:

Output
date       | time     | model_name1 | model_name2
1850-01-16 | 12:00:00 | 0.10        | 0.50

With the following piece of code

out_list4 = os.listdir(out_directory)
df_list = [pd.read_table(out_path+os.fsdecode(file_x), sep='\s+') for file_x in out_list4]

df_merged = reduce(lambda  left,right: ,
                   pd.merge(left,right,on=['date'], how='outer'), df_list)

pd.DataFrame.to_csv(df_merged, out_path+'merged.txt', sep='\t', index=False)

I manage the following output:

Output
date       | time_x     | model_name1 |time_y  | model_name2
1850-01-16 | 12:00:00   |   0.10      |12:00:00| 0.50

As expected since I only have the key ""on=['date']"".

Now if I try to write time as second key as follows: ""on=['date','time']"", it crashes with the following error:

Key error:'time'

and a long list of tracebacks.

I tried placing left_on/righ_on in case "date" was being handled as index. No use. I know the problem does not lie on the files, the structure is right, it is the code. Any help will be much appreciated. And sorry for readibility on the

RIAF
  • 41
  • 6
  • 2
    Seems like one of your files doesn't have the 'time' as a column heading. Or you might need to strip whitespace around your column headings. Maybe you have ' time ' or something like that. – Scott Boston Oct 18 '17 at 15:01
  • 1
    Could you post a short extract of df_list ? – Phik Oct 18 '17 at 15:02
  • Update: the problem is with the df_list. I have created it again reading each file once at a time and then just passing them to create the list df_list = [df1, df2....and so on] and it works like a charm. Now I just have to discover what is happening with the df_list oneliner. – RIAF Oct 19 '17 at 07:28

1 Answers1

1

So, the problem was before. I had defined ""out_list4"" as a list before:

out_list4 = list()

and it was making a mess at the end. Each data element on the list should have size 1872 x 3, but at the end it was adding them altogether again making one last entry be 1872 x 12 and no 'time' header. Changing the definition of ""out_list4"" to:

out_list4 = []

did the trick. The tip came from Combine a list of pandas dataframes to one pandas dataframe.

RIAF
  • 41
  • 6