Questions tagged [pandas-merge]

Use this tag for questions related to the Pandas merge function or merge method of a Pandas DataFrame object

merge is a function (or method) used to merge DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

See:

41 questions
349
votes
6 answers

pandas: merge (join) two data frames on multiple columns

I am trying to join two pandas dataframes using two columns: new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') but got the following error: pandas/index.pyx in pandas.index.IndexEngine.get_loc…
Edamame
  • 23,718
  • 73
  • 186
  • 320
38
votes
4 answers

Preserve Dataframe column data type after outer merge

When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what…
Jeff
  • 551
  • 1
  • 5
  • 19
3
votes
1 answer

DASK: merge throws error when one side's key is NA whereas pd.merge works

I have these sample dataframes: tdf1 = pd.DataFrame([{"id": 1, "val": 4}, {"id": 2, "val": 5}, {"id": 3, "val": 6}, {"id": pd.NA, "val": 7}, {"id": 4, "val": 8}]) tdf2 = pd.DataFrame([{"some_id": 1, "name": "Josh"}, {"some_id": 3, "name":…
Jorge Cespedes
  • 547
  • 1
  • 11
  • 21
2
votes
3 answers

Insert/replace/merge values from one dataframe to another

I have two dataframes like this: df1 = pd.DataFrame({'ID1':['A','B','C','D','E','F'], 'ID2':['0','10','80','0','0','0']}) df2 = pd.DataFrame({'ID1':['A','D','E','F'], 'ID2':['50','30','90','50'], …
Jiao
  • 219
  • 1
  • 10
1
vote
0 answers

Pandas dataframe unique values column merge

I'm still learning data manipulation with Pandas and this is the problem which occurred: Trying to merge two dataframes: Creating the first one with a string filter df_mask1 = df[eval(mask1)].groupby(['Country', 'ID']).agg({'Serial':…
whoisagp
  • 23
  • 7
1
vote
2 answers

Duplicates, and strange values after merging dataframes

I am using Python 3. I have a master dataframe " df " with the columns as shown (with 3 rows of sample data): UNITID CIPCODE AWLEVEL CTOTALT 100654 1.0999 5 9 100654 1.1001 5 10 100654 1.1001 7 6 I have a dataframe…
analyst92
  • 243
  • 1
  • 6
1
vote
1 answer

How to add sub-total columns to a multilevel columns dataframe?

I've a dataframe with 3 levels of multi index columns: quarter Q1 Q2 Totals year 2021 2022 2021 2022 qty orders qty…
Judy T Raj
  • 1,755
  • 3
  • 27
  • 41
1
vote
1 answer

PANDAS How to include data in a MERGE that has missing data in some ROWS

I have two dataframes. One called SERVICES and one called TIMES. I am joining them together like so: servicesMerged = pd.merge(services, times, left_on='Ref_Id', right_on='Ref_ID') This is fine and works, except some of the TIMES data is missing a…
robster
  • 626
  • 1
  • 7
  • 22
1
vote
3 answers

Pandas convert dummies to a new column

I have a dataframe that discretize the customers into different Q's, which looks like: CustomerID_num Q1 Q2 Q3 Q4 Q5 Country 0 12346 1 0 0 0 0 United Kingdom 2 12347 0 0 0 0 1 Iceland 9 12348 …
WilliamL
  • 13
  • 2
1
vote
1 answer

Merge rows in a Pandas Dataframe filling NaN values and removing duplicates

I'm trying to clean a Python Pandas dataframe that contains dirty data with "repeated" (but not exactly duplicated) people information. id name name2 name3 email 1 A A A email@gmail.com 1 A NaN NaN NaN NaN…
Paolo Magnani
  • 549
  • 4
  • 14
1
vote
1 answer

Change columns to rows per student ID

I have data in excel sheet that I am reading into a dataframe: ID Grade Course Q1 Number Q1 Letter Q2 Number Q2 Letter 1 9 English 73 B 69 C 1 9 Math 70 B 52 C 1 9 Science 69 C 80 A desired output: ID Grade Course Semester Number…
1
vote
3 answers

Can you group multiple rows all into one row by column value with Python using pandas?

How do I change this: Date URL Description Category 2022-06-17 14:24:52 /XYBkLO public A 2022-06-17 14:24:52 /XYBkLO public B 2022-06-17 14:24:52 /XYBkLO public C 2022-06-17 14:25:05 /ZWrTVu public A 2022-06-17 …
Nahuel Mel
  • 51
  • 8
1
vote
1 answer

how to merge Two datasets with different time ranges?

I have two datasets that look like…
RafaelP
  • 83
  • 7
1
vote
1 answer

Python: Selecting column values from multiple columns dynamically

I have 2 dataframes. The first is a summary table that summarizes the accuracy (in descending order) of each industry and its source. cols = ['industry', 'source', 'accuracy'] df = pd.DataFrame(np.array([ ['chemical', 'source B', 0.9], …
Theol
  • 13
  • 3
0
votes
1 answer

pandas' dataframes merge challenge with identical strings but different unicodes

I have a problem using pd.merge when some of the rows in the two columns in the two datasets I use to merge the two datasets have different unicodes even though the strings are identical. Here is one example: I have two datasets data1 and data2 both…
1
2 3