0

I'm trying to merge two dataframe here is my code :

how i load data :

import pandas as pd 
df_population = pd.read_csv("C:/Users/Satgiaire 
05/Downloads/population_commune.csv",encoding = 'iso-8859-1') 

How i merge data

df_merged2 = pd.merge(df_population, df_merged, left_on=['CODGEO'],
right_on=['Code_commune_INSEE'],how='inner')

it succeed but the name of the column CODGEO has changed and know i have CODGEO_X . I have CODGEO in both data frame and i wanna have only CODGEO WITHOUT X

Result:enter image description here

EdChum
  • 376,765
  • 198
  • 813
  • 562
amine bak
  • 83
  • 1
  • 11
  • Because you have values that clash in the dfs, so it creates a new column with suffix X and Y to show you where the original values came from. You need to post raw data, code to load the dfs and the desired result as this behaviour is [documented](http://pandas.pydata.org/pandas-docs/stable/merging.html#overlapping-value-columns) – EdChum Apr 25 '17 at 12:59
  • i have only one CODGEO_x in the result. i want CODGEO without x or y. or i wanna merge without duplicating columns. – amine bak Apr 25 '17 at 13:11
  • Are you sure there is no `CODGEO_Y` also after the merge? – EdChum Apr 25 '17 at 13:18
  • i just cheked. i found it cause there is a dashed cause i have a lot of columns. Sorry, so know i have to delete one and change the name of the other? or there is a way to merge without having duplicates columns – amine bak Apr 25 '17 at 13:23
  • You need to decide what the desired result should be, basically you have a clash of values so you need to decide whether you want the lhs/rhs/all rows from the merge – EdChum Apr 25 '17 at 13:27
  • i want all rows but without duplicating columns that have same values. like here CODGEO. – amine bak Apr 25 '17 at 13:31
  • So you don't have CODGEO_Y or Code_commune_INSEE_Y etc.., you can drop either of these and rename: http://stackoverflow.com/questions/19758364/python-rename-single-column-header-in-pandas-dataframe – EdChum Apr 25 '17 at 13:34

1 Answers1

1

The changes to the names that you see are defined by the suffixes= kwarg in pandas.DataFrame.merge. This option only gets utilized if there are columns of the same name that are not used for the merge. For example:

df1 = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
df2 = pd.DataFrame({'a': [1,2,3], 'c': [4,5,6]})
pd.merge(df1, df2, left_on=['b'], right_on=['c'])
   a_x  b  a_y  c
0    1  4    1  4
1    2  5    2  5
2    3  6    3  6

Whereas:

df1 = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
df2 = pd.DataFrame({'a': [1,2,3], 'c': [4,5,6]})
pd.merge(df1.drop('a', axis=1), df2, left_on=['b'], right_on=['c'])
   b  a  c
0  4  1  4
1  5  2  5
2  6  3  6

So you will need to identify columns that you know to be duplicates that you are not using as part of your merge, and drop them from one of your DataFrames.

Grr
  • 15,553
  • 7
  • 65
  • 85