1

I have 3 dataframes including the information from the same group, now I'm trying to concate these dataframes by their group, by set_index as the group name, but because df1 contains indices that are not unique, hence I'm not able to concate them. Is there any way to bypass that?

samples of inputs df:

df1:
group     A       B
 cat      1       0 
 cat      2       7
 cat      5       5
 dog      0.4     1
 dog      2       4
 dog      8       7 
 seal     7       5
 seal     1       8
 seal     7       9

df2:
group     C       D
 cat      1       3
 seal     0       5    
 dog      3       4

df3:
group     E       F
 cat      1       5
 dog      0       3 
 seal     5       9

wanted ouputs:

group     A       B       C        D       E      F
 cat      1       0       1        3       1      5
 cat      2       7       1        3       1      5
 cat      5       5       1        3       1      5
 dog      0.4     1       3        4       0      3
 dog      2       4       3        4       0      3
 dog      8       7       3        4       0      3 
 seal     7       5       0        5       5      9
 seal     1       8       0        5       5      9
 seal     7       9       0        5       5      9

my code:

 df1 = pd.read(file).set_index('group')
 df2 = pd.read(file).set_index('group')
 df3 = pd.read(file).set_index('group')

 all_data = pd.concate(df1, df2, df3, axis = 1).reset_index()

error:

 pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

thanks!

Lumos
  • 1,303
  • 2
  • 17
  • 32

1 Answers1

1

I think you can use concat of df2 and df3 first if same sizes and then join:

df = pd.concat([df2.set_index('group'), df3.set_index('group')], axis = 1)
all_data = df1.join(df, on='group')
print (all_data)
  group    A  B  C  D  E  F
0   cat  1.0  0  1  3  1  5
1   cat  2.0  7  1  3  1  5
2   cat  5.0  5  1  3  1  5
3   dog  0.4  1  3  4  0  3
4   dog  2.0  4  3  4  0  3
5   dog  8.0  7  3  4  0  3
6  seal  7.0  5  0  5  5  9
7  seal  1.0  8  0  5  5  9
8  seal  7.0  9  0  5  5  9

Also is possible use parameter index_col in read_csv instead set_index:

df1 = pd.read(file)
df2 = pd.read(file, index_col='group')
df3 = pd.read(file, index_col='group')

df = pd.concat([df2, df3], axis = 1)
all_data = df1.join(df, on='group')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks! This is a good way to do it for a small number of dataframes. But what if a large number of dataframes and may require `concate` and `join` for many times, do you think that is there any way to `concate` all together at the same time and bypass this `unique value` error? Thanks!! – Lumos Jul 25 '17 at 05:24
  • I think `concat` works with multiple dataframes perfectly and if need multiple join is possible use similar approach with `merge(how='left')` by [this](https://stackoverflow.com/a/30512931/2901002) answer. – jezrael Jul 25 '17 at 05:38
  • 1
    thx, the post is helpful! I actually prefer `merge`, because `join` seems have requirement of common values at the same index position. And I didn't see the difference between `merge` and `join` so far... Thanks! – Lumos Jul 25 '17 at 06:03
  • 1
    Glad can help! Nice day! – jezrael Jul 25 '17 at 06:06