0

Im trying to merge only some columns of two dataframes. But Im having a error in subject.

import pandas as pd

planilha = pd.read_excel('campanha.xlsx', None)

pastas = list(planilha.keys())

campanha0 = pastas[0]

tabela = planilha['pluma_espuma_1'].merge(planilha['campanha_anexo3_mar_0'][['subprojet', 'Campanha', 'Pernada']], left_on='ParentGlobalID', right_on='GlobalID', how='left')
print(tabela.head())

but got the following error:

Traceback (most recent call last):
  File "D:\DADOS\Python\trabalhos\Logistica_rrdm\teste_excel_pd.py", line 19, in <module>
    tabela = planilha['pluma_espuma_1'].merge(planilha['campanha_anexo3_mar_0'][['subprojet', 'Campanha', 'Pernada']], left_on='ParentGlobalID', right_on='GlobalID', how='left')
  File "C:\Users\Aroldo\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 8195, in merge
    return merge(
  File "C:\Users\Aroldo\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\reshape\merge.py", line 74, in merge
    op = _MergeOperation(
  File "C:\Users\Aroldo\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\reshape\merge.py", line 668, in __init__
    ) = self._get_merge_keys()
  File "C:\Users\Aroldo\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\reshape\merge.py", line 1033, in _get_merge_keys
    right_keys.append(right._get_label_or_level_values(rk))
  File "C:\Users\Aroldo\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py", line 1684, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'GlobalID'

About my data, I'm getting data from excel, like above, but constructing a dataframe my inputs looks exactly like this:

import pandas as pd
pluma_espuma_1 = pd.DataFrame({'ObjectID': [3, 4],
                   'GlobalID': ['a431fd6a-24f6-436e-a3b4-e7d8b44c80a3', 'b5ad25e8-9c99-40fd-b838-4127c6457f59'],
                    'Data': ['21/04/2021', '01/05/2021'],
                    'Estacao': [1500, 5500],
                    'ParentGlobalID': ['29aaebfa-67bb-4395-9d72-5aa19fcda267', 'e610b5e0-bf10-4239-90bb-3d2099e009e0']})

campanha_anexo3_mar_0 = pd.DataFrame({'ObjectID': [2, 3],
                    'GlobalID': ['29aaebfa-67bb-4395-9d72-5aa19fcda267', 'e610b5e0-bf10-4239-90bb-3d2099e009e0'],
                    'subproject': ['Marinho - Integrado', 'Dulcicola'],
                    'Campanha': [22, 23],
                    'Pernada': [1, 1],
                    'Creator': ['Ed_tty', 'Haruald']})

Any idea what should be the right way to do this? Thanks!

data sample, picture

  • Yes, your merge statement looks odd. What columns are you trying to merge? It will help clarify your question if you provide a sample of your dataframe (or dataframes) and show your desired output. For more information, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391). You can [edit](https://stackoverflow.com/posts/67828983/edit) your post. – AlexK Jun 03 '21 at 22:10
  • you are merging the same dataframe together -- essentially planilha.merge(planilha). you want df1.merge(df2) with the correct arguments https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html – Jonathan Leon Jun 03 '21 at 22:54
  • Also, make sure that ParentGlobalID exists in your left dataframe before you merge. By using `planilha['pluma_espuma_1'].merge()`, you're first dropping all columns except `pluma_espuma_1`, then trying to merge on `ParentGlobalID` (which is no longer in the dataframe). Maybe try `planilha[['ParentGlobalID', 'pluma_espuma_1']].merge(...)`? – Michael Delgado Jun 03 '21 at 23:01
  • Otherwise, I'd definitely read the article referenced by @AlexK, and also the StackOverflow guide on [how to ask](https://stackoverflow.com/help/how-to-ask). It's hard for us to diagnose your problem without a minimal reproducible example. Try breaking your issue down and inspecting the code at each step, including breaking apart your erroring line into multiple steps to make sure it's doing what you think it is. – Michael Delgado Jun 03 '21 at 23:09
  • Thanks you all, for time spend and instructions... – Aroldo Rocha Jun 06 '21 at 23:07

1 Answers1

0

You can play around with column names to get it the way you want, but this merges the two

pluma_espuma_1.merge(campanha_anexo3_mar_0[['GlobalID', 'subproject', 'Campanha', 'Pernada']], left_on='ParentGlobalID', right_on='GlobalID', how='inner')


   ObjectID                            GlobalID_x        Data  Estacao                        ParentGlobalID                            GlobalID_y           subproject  Campanha  Pernada
0         3  a431fd6a-24f6-436e-a3b4-e7d8b44c80a3  21/04/2021     1500  29aaebfa-67bb-4395-9d72-5aa19fcda267  29aaebfa-67bb-4395-9d72-5aa19fcda267  Marinho - Integrado        22        1
1         4  b5ad25e8-9c99-40fd-b838-4127c6457f59  01/05/2021     5500  e610b5e0-bf10-4239-90bb-3d2099e009e0  e610b5e0-bf10-4239-90bb-3d2099e009e0            Dulcicola        23        1
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14