1

I am using a custom function in pandas that iterates over cells in a dataframe, finds the same row in a different dataframe, extracts it as a tuple, extracts a random value from that tuple, and then adds a user specified amount of noise to the value and returns it to the original dataframe. I was hoping to find a way to do this that uses applymap, is it possible? I couldn't find a way using applymap, so I used itertuples, but an applymap solution should be more efficient.

import pandas as pd
# Mock data creation
key = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4,5,6], 'col3':[7,8,9]})
results = pd.DataFrame(np.zeros((3,3)))

def apply_value(value):    
    key_index = # <-- THIS IS WHERE I NEED A WAY TO ACCESS INDEX
    key_tup = key.iloc[key_index]
    length = (len(key_tup) - 1)
    random_int = random.randint(1, length)
    random_value = key_tup[random_int]
    return random_value

results = results.applymap(apply_value)
smci
  • 32,567
  • 20
  • 113
  • 146
  • Existing Q&A: [Using Pandas to "applymap" with access to index/column?](https://stackoverflow.com/questions/39773833/using-pandas-to-applymap-with-access-to-index-column), [Pandas - get row and column name for each element during applymap](https://stackoverflow.com/questions/43654727/pandas-get-row-and-column-name-for-each-element-during-applymap) – smci Mar 19 '22 at 22:43
  • a) Your code boils down to just doing `np.random.choice(key_tup)` b) You need to fix your example data, if `key` is really supposed to be a dataframe of tuples (not just values). – smci Mar 19 '22 at 23:00

2 Answers2

0

If I understood your problem correctly, this piece of code should work. The problem is that applymap does not hold the index of the dataframe, so what you have to do is to apply nested apply functions: the first iterates over rows, and we get the key from there, and the second iterates over columns in each row. Hope it helps. Let me know if it does :D

# Mock data creation
key = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4,5,6], 'col3':[7,8,9]})
results = pd.DataFrame(np.zeros((3,3)))
def apply_value(value, key_index):
    key_tup= key.loc[key_index]
    length = (len(key_tup) - 1)
    random_int = random.randint(1, length)
    random_value = key_tup[random_int]
    return random_value
results = results.apply(lambda x: x.apply(lambda d: apply_value(d, x.name)), axis=1) 
ivallesp
  • 2,018
  • 1
  • 14
  • 21
0

Strictly you don't need to access row-index inside your function, there are other simpler ways to implement this. You can probably do without it entirely, you don't even need do a pandas JOIN/merge of rows of key. But first, you need to fix your example data, if key is really supposed to be a dataframe of tuples.

So you want to:

  • sweep over each column with apply(... , axis=1)
  • lookup the value of each cell key.loc[key_index]...
  • ...which is supposed to give you a tuple key_tup, but in your example key was a simple dataframe, not a dataframe of tuples
  • key_tup = key.iloc[key_index]
  • the business with:
    length = (len(key_tup) - 1)
    random_int = random.randint(1, length)
    random_value = key_tup[random_int]
  • can be simplified to just:
np.random.choice(key_tup)
  • in which case you likely don't need to declare apply_value()
smci
  • 32,567
  • 20
  • 113
  • 146