Problems joining two Pandas Dataframes

Question

I'm trying to create a report of the cards I have in Trello through Rest API, where I need to show in the same report the card data and the names of the members assigned to each card.

The problem is that the Trello JSON is very cumbersome, and I need to make several queries, and then merge the different data frames.

I'm currently stuck, trying to add the cardmember names to the main card data frame.

I'm sending you a summary of the problem:

I have created the main data frame (trello_dataframe), where I have card level information from Trello, including the "ID Members" column (trello_dataframe['ID Members'], in list form, which I need to merge with another data frame.

More info about trello_dataframe: https://prnt.sc/boC6OL50Glwu

The second data frame (df_response_members) results from the query at the board member level, where I have 3 columns (ID Members (df_response_members['ID Members']), FullName (df_response_members['Member (Full Name)']), and Username (df_response_members['Member (Username)']).

More info about "df_response_members": https://prnt.sc/x6tmzI04rohs

Now I want to merge these two data frames, grouped by df_response_members['ID Members'], so that the full name and username of the card members appear in the card data frame (it's the main one).

The problem occurs when I try to merge the two data frames, with the following code, and I get the error

TypeError: unhashable type: 'list'.

at

trello_dataframe = pd.merge(df_response_members,trello_dataframe, on="ID Members", how='outer')

Here is how I would like to see the main data frame: https://prnt.sc/7PSTmG2zahZO

Thank you in advance!

Please use code tags (one backtick for inline, three backticks for multi-line). Otherwise, it's too hard for us to read your code. — Zorgoth, May 23 '22 at 17:18

score 1 · Accepted Answer · answered May 23 '22 at 18:17

1

You can't do that for two reasons: A) as the error says, lists aren't hashable, and DataFrame operations tipically don't work on unhashable data types, and, B) you are trying to merge a list column with a string column. Both column types should be the same in order to perform a merge.

A solution could be to first use DataFrame.explode() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html) on your first DataFrame trello_dataframe using the 'ID Members' column, this will generate an independent row for each 'ID Member' on each list. Now you can perform your merge with this DataFrame.

To convert back to your desired format you can use GroupBy, as stated here: How to implode(reverse of pandas explode) based on a column.

answered May 23 '22 at 18:17

scespinoza

396
3
10

The "explode" function works, with this code: trello_dataframe= trello_dataframe.explode('ID Members', ignore_index=False) trello_dataframe = pd.merge(df_response_members,trello_dataframe, on="ID Members", how='outer') # Now the dataframe looks like this: https://prnt.sc/xC3s-VrJDmbX #But I would need the names to appear on the same line, separated by commas: https://prnt.sc/7PSTmG2zahZO Any suggestions? Thanks! – Junior P May 23 '22 at 22:18
As I said in the answer, you should use GroupBy to reverse the explode operation you just did. Code might look something like `df.groupby(other_columns).agg({'Members ID': lambda x: x.tolist(),'Member (Full Name)': lambda x: ', '.join(x.tolist())})`, where ´other_columns´ should be a list containing the names of all the columns you are not aggregating. I ommited 'Member (username)' for brevity. – scespinoza May 24 '22 at 00:30
Hello, I have grouped with the indicated code, but the values are duplicated. Is there a way to remove the duplicates of each row? https://prnt.sc/2b8iwjYRo7PU – Junior P May 24 '22 at 16:08
That sounds like you are missing some columns in the `other_columns` list. Can you update your question with the code you are using now? – scespinoza May 24 '22 at 18:18
HI, I have found how to remove duplicate values. Here is the post: https://stackoverflow.com/questions/72366340/remove-duplicate-values-in-each-row-of-the-column/72366520#72366520 Tks! – Junior P May 24 '22 at 21:21
Glad you manage to make it work! I was looking at your first table and noted that the duplicate values were already there. Look how first 4 rows show the same information. It would be better if you remove duplicates before exploding. – scespinoza May 24 '22 at 22:11

Problems joining two Pandas Dataframes

1 Answers1