8

I created a loop to read sqlite database into a pandas dataframe, and I am trying to merge them together based on "Code"

...
df = pandas.Dataframe()  # Creating an empty dataframe for merging at the end

items = ["tb1", "tb2", "tb3"]

for each_item in items:
    my_value = pandas.read_sql_query('select "Code", "Name", "Value" from {tb_name} where "Value" is not null'
                                     .format(tbl_name='"%s"' % each_item), con=engine)

    print(my_value)

    # This below code is my attempt to merge the dataframes that was obtained through the for loop
    merge_value = pandas.merge(my_value, df, on='Code', how='outer')

my_value results:

# tb1 results
     Code          Name      Value
0     C01         Name1   0.010000
1     C02         Name2   0.001200
2     C03         Name3   0.000300
3     C04         Name4   0.001700

# tb2 results
     Code          Name      Value
0     C03         Name3   0.010000
1     C04         Name4   0.001200
2     C05         Name5   0.000300
3     C06         Name6   0.001700

# tb3 results
     Code          Name      Value
0     C01         Name1   0.010000
1     C02         Name2   0.001200
2     C05         Name5   0.000300
3     C06         Name6   0.001700

I am trying to merge them into one table like below:

# desired results
     Code          Name    Value_x   Value_y    Value_Z
0     C01         Name1   0.010000      NULL   0.010000      
1     C02         Name2   0.001200      NULL   0.001200      
2     C03         Name3   0.000300  0.010000       NULL
3     C04         Name4   0.001700  0.001200       NULL      
4     C05         Name5       NULL  0.000300   0.000300  
5     C06         Name6       NULL  0.001700   0.001700  

How do I merge it? I tried the below, but it produces key error: Code

merge_value = pandas.merge(my_value, df, on='Code', how='outer')
jake wong
  • 4,909
  • 12
  • 42
  • 85
  • Here is a link with a solution to the loop problem (distinct from the merge problem): https://simplernerd.com/python-pandas-merge-dataframes-loop/ Basically, if you have a list of dataframes dfs, then use this code: from functools import reduce merged_df = reduce(lambda l, r: pd.merge(l, r, on='date', how='inner'), dfs) – GuyStalks Mar 30 '22 at 13:10

1 Answers1

12

You can use DataFrame.merge instead of pandas.merge

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html


Create an empty DataFrame with the columns to prevent the "key error: Code"

df = pd.DataFrame(columns=['Code']) 

then in the loop, you

df.merge(my_value, on='Code', how='outer') 

after my_value is created

Jeffrey Ram
  • 1,132
  • 1
  • 10
  • 16
  • Hello, the issue is that `tb1, tb2, tb3` is obtained through the `for` loop. So i cannot explicitly type the solution you proposed. – jake wong Sep 02 '17 at 09:32
  • 1
    Try creating an empty DataFrame with the columns df = pd.DataFrame(columns=['Code']) then in the loop, you df.merge(my_value, on='Code', how='outer') after my_value is created – Jeffrey Ram Sep 02 '17 at 09:38
  • 1
    @jakewong to keep what's being merged you can start with an initial dataframe empty or not and overwrite it with the new value in the for loop, you would have something like: first_df = pd.merge(first_df, df,on='COL_NAME',how='outer'), in this way you're merging and appending at the same time as you go along in the for loop – Maya Petranova Dec 09 '22 at 14:00