0

I have a list of dataframes. Each dataframe within the list is unique - meaning that there are some shared, but different columns. I would like to create a single dataframe that contains all of the columns from the list of dataframes and will fill NaN if an element is not present. I have tried the following

import pandas as pd
df_new = pd.concat(list_of_dfs)
#I get the following: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Issue seem to be due to the dataframes in the list. Each data frame only has one row, so its index is zero and thus reindexing will not do the trick. I have tried this:

 list_of_dfs.append(pd.DataFrame([rows], columns = tags).set_index(np.array(random.randint(0,5000))))

Pretty much generating a random number as the index. However, O get this error:

ValueError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.
GK89
  • 646
  • 5
  • 29
  • 3
    Does this answer your question? [Concat dataframe reindexing only valid with uniquely valued index objects](https://stackoverflow.com/questions/35084071/concat-dataframe-reindexing-only-valid-with-uniquely-valued-index-objects) – mozway Jul 11 '21 at 21:09
  • Could you reset the index of each dataframe then concat and set index back? – Henry Ecker Jul 11 '21 at 21:12

2 Answers2

1

You need to use some params in pd.concat:

import pandas as pd

df1 = pd.DataFrame({'a':[1,2,3],'x':[4,5,6],'y':[7,8,9]})
df2 = pd.DataFrame({'b':[10,11,12],'x':[13,14,15],'y':[16,17,18]})

print(pd.concat([df1,df2], axis=0, ignore_index=True))

Result:

     a   x   y     b
0  1.0   4   7   NaN
1  2.0   5   8   NaN
2  3.0   6   9   NaN
3  NaN  13  16  10.0
4  NaN  14  17  11.0
5  NaN  15  18  12.0

So, use concat like that:

pd.concat(list_of_dfs, axis=0, ignore_index=True)
magicarm22
  • 135
  • 10
0

How about trying this:

If your indicies are already unique, this should not hurt them:

df = df.loc[~df.index.duplicated(keep='first')]

but rather ensure they are unique. You might use axis set to index to ensure that indicies are used as a basis for concatenation:

df_new = pd.concat(list_of_dfs, axis='index')
Bilal Qandeel
  • 727
  • 3
  • 6