Create a Dataframe from a list and keep duplicate items

Question

I have a list of dataframes. Each dataframe within the list is unique - meaning that there are some shared, but different columns. I would like to create a single dataframe that contains all of the columns from the list of dataframes and will fill NaN if an element is not present. I have tried the following

import pandas as pd
df_new = pd.concat(list_of_dfs)
#I get the following: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Issue seem to be due to the dataframes in the list. Each data frame only has one row, so its index is zero and thus reindexing will not do the trick. I have tried this:

 list_of_dfs.append(pd.DataFrame([rows], columns = tags).set_index(np.array(random.randint(0,5000))))

Pretty much generating a random number as the index. However, O get this error:

ValueError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.

Does this answer your question? [Concat dataframe reindexing only valid with uniquely valued index objects](https://stackoverflow.com/questions/35084071/concat-dataframe-reindexing-only-valid-with-uniquely-valued-index-objects) — mozway, Jul 11 '21 at 21:09
Could you reset the index of each dataframe then concat and set index back? — Henry Ecker, Jul 11 '21 at 21:12

score 1 · Accepted Answer · answered Jul 11 '21 at 21:13

1

You need to use some params in pd.concat:

import pandas as pd

df1 = pd.DataFrame({'a':[1,2,3],'x':[4,5,6],'y':[7,8,9]})
df2 = pd.DataFrame({'b':[10,11,12],'x':[13,14,15],'y':[16,17,18]})

print(pd.concat([df1,df2], axis=0, ignore_index=True))

Result:

     a   x   y     b
0  1.0   4   7   NaN
1  2.0   5   8   NaN
2  3.0   6   9   NaN
3  NaN  13  16  10.0
4  NaN  14  17  11.0
5  NaN  15  18  12.0

So, use concat like that:

pd.concat(list_of_dfs, axis=0, ignore_index=True)

answered Jul 11 '21 at 21:13

magicarm22

135
10

FWIW `pd.concat([df1,df2])` wouldn't raise an `InvalidIndexError` for this example and would work as expected. – Henry Ecker Jul 11 '21 at 21:15
@magicarm22 I have updated the post with more info – GK89 Jul 11 '21 at 23:44

score 0 · Answer 2 · answered Jul 11 '21 at 21:16

0

How about trying this:

If your indicies are already unique, this should not hurt them:

df = df.loc[~df.index.duplicated(keep='first')]

but rather ensure they are unique. You might use axis set to index to ensure that indicies are used as a basis for concatenation:

df_new = pd.concat(list_of_dfs, axis='index')

answered Jul 11 '21 at 21:16

Bilal Qandeel

727
3
6

I have updated the post with more info – GK89 Jul 11 '21 at 23:43

Create a Dataframe from a list and keep duplicate items

2 Answers2