I have a table which has internal SKUs in column 0 and then synonyms along that row. The number of synonyms is not constant (ranging from 0 to 7, but will have a tendency to grow)
I need an effient function which will allow me to get SKUs from one column in a large table and translate them to synonym 0 from my other table.
This is my current function which takes an array of SKUs from one table, searches for them in another and gives me the first column value where it finds a synonym.
def new_array(dfarray1, array1, trans_dic):
missing_values = set([])
new_array = []
for value in array1:
pos = trans_dic.eq(str(value)).any(axis=1)
if len(pos[pos]) > 0 :
new_array.append(trans_dic['sku_0'][pos[pos].index[0]])
else:
missing_values.add(str(value))
if len(missing_values) > 0 :
print("The following values are missing in dictionary. They are in DF called:"+dfarray1)
print(missing_values)
sys.exit()
else:
return new_array
I'm sure that this is very badly written because it takes my laptop about 3 minutes to go through about 75K values only. Can anyone help me make this faster?
Some questions asked previously:
What types are your function parameters? (can guess pandas, but no way to know for sure)
Yes. I am working on two pandas dataframes.
What does your table even look like?
Dictionary table:
SKU0 | Synonym 0 | Synonym 1 | Synonym 2 |
---|---|---|---|
foo | bar | bar1 | |
foo1 | baar1 | ||
foo2 | baaar0 | baar2 |
Values table:
SKU | Value | Value1 | value1 |
---|---|---|---|
foo | 3 | 1 | 7 |
baar1 | 4 | 5 | 7 |
baaar0 | 5 | 5 | 9 |
Desired table:
SKU | Value | Value1 | value1 |
---|---|---|---|
foo | 3 | 1 | 7 |
foo1 | 4 | 5 | 7 |
foo2 | 5 | 5 | 9 |
What does the rest of your code that is calling this function look like?
df1.sku = new_array('df1', list(df1.sku), sku_dic)