2

The objective of the code below is to implement a FULL OUTER JOIN using three pandas data frames. All records of all data frames should be printed, and if there's a relationship between two or three records, then they should be printed in the same line.

The fields to use to relate the data frames are type_1 and id_1 in the first data frame, type_2 and id_2 in the second data frame and type_3 and id_3 in the third data frame.

The problem is that the relationship between the second and third data frames is not working. Take a look at the case in line 11 and 13, it should be a single line because type_2 = type_3 and id_2 = id_3. The expected output is in line 11 11 NaN NaN NaN 7.0 8 KoKo 7.0 8 Kuku and line 13 shouldn't be printed. How to fix this?

import pandas as pd
raw_data = {
        'type_1': [0, 1, 1, 2, 2],
        'id_1': ['3', '4', '5', '3', '3'],
        'name_1': ['Alex', 'Amy', 'Allen', 'Peter', 'Liz']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'id_1', 'name_1' ])

raw_datab = {
        'type_2': [1, 1, 1, 0,7],
        'id_2': ['4', '5', '5', '7', '8'],
        'name_2': ['Billy', 'Brian', 'Joe', 'Bryce', 'KoKo']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'id_2', 'name_2'])

raw_datac = {
        'type_3': [1, 1, 1, 1, 2, 2, 7],
        'id_3': ['4', '6', '5', '5', '3', '3','8'],
        'name_3': ['School', 'White', 'Jane', 'Homer', 'Paul', 'Lorel', 'Kuku']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'id_3', 'name_3'])

merged = df_a
merged = merged.merge(df_b, how='outer', left_on=['type_1', 'id_1'],
                      right_on=['type_2', 'id_2'])
merged = merged.merge(df_c, how='outer', left_on=['type_1', 'id_1'], 
                      right_on=['type_3', 'id_3'])

print(merged)

Result:

    type_1 id_1 name_1  type_2 id_2 name_2  type_3 id_3  name_3
0      0.0    3   Alex     NaN  NaN    NaN     NaN  NaN     NaN
1      1.0    4    Amy     1.0    4  Billy     1.0    4  School
2      1.0    5  Allen     1.0    5  Brian     1.0    5    Jane
3      1.0    5  Allen     1.0    5  Brian     1.0    5   Homer
4      1.0    5  Allen     1.0    5    Joe     1.0    5    Jane
5      1.0    5  Allen     1.0    5    Joe     1.0    5   Homer
6      2.0    3  Peter     NaN  NaN    NaN     2.0    3    Paul
7      2.0    3  Peter     NaN  NaN    NaN     2.0    3   Lorel
8      2.0    3    Liz     NaN  NaN    NaN     2.0    3    Paul
9      2.0    3    Liz     NaN  NaN    NaN     2.0    3   Lorel
10     NaN  NaN    NaN     0.0    7  Bryce     NaN  NaN     NaN
11     NaN  NaN    NaN     7.0    8   KoKo     NaN  NaN     NaN
12     NaN  NaN    NaN     NaN  NaN    NaN     1.0    6   White
13     NaN  NaN    NaN     NaN  NaN    NaN     7.0    8    Kuku
ps0604
  • 1,227
  • 23
  • 133
  • 330
  • the rows 11 and 13 have different `type_2` and `id_2` - they are in the merged dataframe after first merge. their `type_3` and `id_3` also different -why is this result unsatisfactory for your task? – Evgeny Jun 15 '18 at 00:43
  • In the case of rows 11 and 13, type_2 = type_3 and id_2 = id_3 that's why they should be in a single row – ps0604 Jun 15 '18 at 00:46
  • but there is no merge command that enforces it. are you sure about `left_on=['type_1', 'id_1']`? – Evgeny Jun 15 '18 at 00:50
  • I'm not sure, and that's probably the problem, but how can I tell pandas to use either `type_1/id_1` or `type_2/id_2` as a result of the first merge to merge with a third data frame? it should work as an SQL FULL OUTER JOIN of three tables. – ps0604 Jun 15 '18 at 00:56
  • First you have duplicate key it is hard to merge – BENY Jun 15 '18 at 01:26
  • @Wen It should work similar to SQL, if there are two records with the same key related to another record in a different table, then the result should be two records printed out – ps0604 Jun 15 '18 at 01:27
  • When two key merge with two same key it will produced 4 – BENY Jun 15 '18 at 01:28
  • @Wen, shouldn't pandas `merge` function work similar to SQL full outer join when merging three or more data frames? – ps0604 Jun 15 '18 at 01:29
  • I think they should be same , can you show your expected output ? – BENY Jun 15 '18 at 01:31
  • The expected output is in line 11 `11 NaN NaN NaN 7.0 8 KoKo 7.0 8 Kuku` and line 13 shouldn't be printed – ps0604 Jun 15 '18 at 01:32
  • I got it add an answer – BENY Jun 15 '18 at 01:43

1 Answers1

2

You need to create a merge key before merge

df_a[['key1','key2']]=df_a[['type_1', 'id_1']]
df_b[['key1','key2']]=df_b[['type_2', 'id_2']]
df_c[['key1','key2']]=df_c[['type_3', 'id_3']]


merged = df_a
merged = merged.merge(df_b, how='outer')
merged = merged.merge(df_c, how='outer')
merged.drop(['key1','key2'],1)
Out[81]: 
    type_1 id_1 name_1  type_2 id_2 name_2  type_3 id_3  name_3
0      0.0    3   Alex     NaN  NaN    NaN     NaN  NaN     NaN
1      1.0    4    Amy     1.0    4  Billy     1.0    4  School
2      1.0    5  Allen     1.0    5  Brian     1.0    5    Jane
3      1.0    5  Allen     1.0    5  Brian     1.0    5   Homer
4      1.0    5  Allen     1.0    5    Joe     1.0    5    Jane
5      1.0    5  Allen     1.0    5    Joe     1.0    5   Homer
6      2.0    3  Peter     NaN  NaN    NaN     2.0    3    Paul
7      2.0    3  Peter     NaN  NaN    NaN     2.0    3   Lorel
8      2.0    3    Liz     NaN  NaN    NaN     2.0    3    Paul
9      2.0    3    Liz     NaN  NaN    NaN     2.0    3   Lorel
10     NaN  NaN    NaN     0.0    7  Bryce     NaN  NaN     NaN
11     NaN  NaN    NaN     7.0    8   KoKo     7.0    8    Kuku
12     NaN  NaN    NaN     NaN  NaN    NaN     1.0    6   White
BENY
  • 317,841
  • 20
  • 164
  • 234