Pandas - get row and column name for each element during applymap

Question

I am trying to compare one list of strings for similarity and get the results in a pandas dataframe for inspection; so I use one list as index and the other as column list. I then want to compute the "Levenshtein similarity" on them (a function that compares the similarity between two words).

I am trying to do that using applymap on every cell, and compare the cell index to the cell column. How could I do that? Or simpler alternatives?

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)

def lev(x):
    x = Levenshtein.distance(x.index, x.column)  
matrix.applymap(lev)

so far I resorted to use the following (below) but I find it clumsy and slow

matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
    for j, value in enumerate(values):
        matrix.ix[i,j] = Levenshtein.distance(i, value)

score 17 · Accepted Answer · edited Mar 19 '22 at 22:41

17

I think you can use apply on the dataframe, and to access columns' values use .name:

def lev(x):
    #replace your function
    return x.index + x.name
a = matrix.apply(lev)
print (a)
                  walking          caring          biking          eating
car            carwalking       carcaring       carbiking       careating
bike          bikewalking      bikecaring      bikebiking      bikeeating
sidewalk  sidewalkwalking  sidewalkcaring  sidewalkbiking  sidewalkeating
eatery      eaterywalking    eaterycaring    eaterybiking    eateryeating

EDIT:

If need some arithemtic operation use broadcasting:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating

Or:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating

edited Mar 19 '22 at 22:41

smci

32,567
20
113
146

answered Apr 27 '17 at 10:12

jezrael

822,522
95
1,334
1,252

1

brilliant - thank you. but would there be no way to achieve that has a vectorized/elementwise operation? What you do is essentially what i ve been doing with iterrows – jim jarnac Apr 27 '17 at 10:17
It depends what you need. If need some arthemtic operation only, use numpy. – jezrael Apr 27 '17 at 10:25
I see thank, but i am not after arithmetic operation, rather into applying a simple function to each cell like in the question. I thought applymap, being an elementwise method would do. – jim jarnac Apr 27 '17 at 10:26
Yes, but ther is no connection with index and column values in `applymap` - check it by `a.applymap(lambda x: print (x))` – jezrael Apr 27 '17 at 10:28
OK so i understand I would have to iterate with apply / iterrows anyway. A bit disappointing, but thx – jim jarnac Apr 27 '17 at 10:31
1

Not sure, but if check [`this`](http://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) apply can be faster. But if apply in each loop some function, maybe it is same pefrform. – jezrael Apr 27 '17 at 10:33
I'm confused, I get AttributeError: 'int' object has no attribute 'name' and AttributeError: 'int' object has no attribute 'index' when I try `return x.index + x.name`. My dataframe is made of integers. – rkian Jun 19 '21 at 10:40
@rkian: you have a separate issue, please post a separae question, cite this question and answer in a link, and we'll gladly look at it. Most likely, if you get *"object has no attribute 'name'"*, then you accessed a single cell, not a series, so probaly you tried to `.apply()` on a single series, not a dataframe. – smci Mar 19 '22 at 22:39

score 9 · Answer 2 · answered Sep 04 '18 at 04:02

You can do that by "nested apply" as follows:

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))

Output:

          walking  caring  biking  eating
car             6       3       6       5
bike            6       5       3       5
sidewalk        7       8       7       8
eatery          6       5       6       3

The call pd.DataFrame(x) here is because x is a Series object and the Series.apply is similar to applymap, which does not carry index or columns information.

score 0 · Answer 3 · answered Jun 19 '23 at 13:35

Here is a combination of apply and comprehension:

def mapping_function(value, index, column_name):
    # this is called for each cell
    mapping_result = column_name + '|' + str(index) + '|' + str(value)
    return mapping_result


def _column_mapping_function(column_series):
    column_name = column_series.name
    new_series_data = [mapping_function(value, index, column_name) for index, value in column_series.items()]
    new_series = pd.Series(data=new_series_data, index=column_series.index)
    return new_series
    
result = indexed_data_frame.apply(_column_mapping_function)

Pandas - get row and column name for each element during applymap

3 Answers3

Linked