2

From the following Pandas dataframe (actually a distance matrix):

        foo   foo   bar   bar   spam  spam
foo     0.00  0.35  0.83  0.84  0.90  0.89
foo     0.35  0.00  0.86  0.85  0.92  0.91
bar     0.83  0.86  0.00  0.25  0.88  0.87
bar     0.84  0.85  0.25  0.00  0.82  0.86
spam    0.90  0.92  0.88  0.82  0.00  0.50
spam    0.89  0.91  0.87  0.86  0.50  0.00

I was trying to create lists deriving from all combinations of ['foo','bar','spam'], to obtain the following lists with unique values:

foo_foo = [0.35]
foo_bar = [0.83,0.84,0.86,0.85]
foo_spam = [0.90,0.89,0.92,0.91]
bar_bar = [0.25]
bar_spam = [0.88,0.87,0.82,0.86]
spam_spam = [0.50]

I used df.get_values and iterrows without success, and also these answers How to get a value from a cell of a data frame? and pandas: how to get scalar value on a cell using conditional indexing were not useful.

Is there a way to afford that? Any help would be appreciated

Community
  • 1
  • 1
valeten
  • 47
  • 1
  • 6

2 Answers2

2

IIUC:

In [93]: from itertools import combinations

In [94]: s = pd.Series(df.values[np.triu_indices(len(df), 1)],
    ...:               index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))
    ...:

In [95]: s
Out[95]:
foo   foo     0.35
      bar     0.83
      bar     0.84
      spam    0.90
      spam    0.89
      bar     0.86
      bar     0.85
      spam    0.92
      spam    0.91
bar   bar     0.25
      spam    0.88
      spam    0.87
      spam    0.82
      spam    0.86
spam  spam    0.50
dtype: float64

as a DF:

In [96]: s.reset_index(name='dist')
Out[96]:
   level_0 level_1  dist
0      foo     foo  0.35
1      foo     bar  0.83
2      foo     bar  0.84
3      foo    spam  0.90
4      foo    spam  0.89
5      foo     bar  0.86
6      foo     bar  0.85
7      foo    spam  0.92
8      foo    spam  0.91
9      bar     bar  0.25
10     bar    spam  0.88
11     bar    spam  0.87
12     bar    spam  0.82
13     bar    spam  0.86
14    spam    spam  0.50
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

Let's take MaxU's solution further (give credit to his solution):

from itertools import combinations

s = pd.Series(df.values[np.triu_indices(len(df), 1)],
      index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))

df_s = s.to_frame()

df_s.index = df_s.index.map('_'.join)

df_s.groupby(level=0)[0].apply(lambda x: x.tolist())

Output:

bar_bar                        [0.25]
bar_spam     [0.88, 0.87, 0.82, 0.86]
foo_bar      [0.83, 0.84, 0.86, 0.85]
foo_foo                        [0.35]
foo_spam      [0.9, 0.89, 0.92, 0.91]
spam_spam                       [0.5]
Name: 0, dtype: object

And, lastly printing:

for i,v in df_out.iteritems():
    print(str(i) + ' = ' + str(v))

Output:

bar_bar = [0.25]
bar_spam = [0.88, 0.87, 0.82, 0.86]
foo_bar = [0.83, 0.84, 0.86, 0.85]
foo_foo = [0.35]
foo_spam = [0.9, 0.89, 0.92, 0.91]
spam_spam = [0.5]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187