Translate my SKUs using a dictionary with Pandas

Question

I have a table which has internal SKUs in column 0 and then synonyms along that row. The number of synonyms is not constant (ranging from 0 to 7, but will have a tendency to grow)

I need an effient function which will allow me to get SKUs from one column in a large table and translate them to synonym 0 from my other table.

This is my current function which takes an array of SKUs from one table, searches for them in another and gives me the first column value where it finds a synonym.

def new_array(dfarray1, array1, trans_dic): 
    missing_values = set([])
    new_array = [] 
    for value in array1: 
        pos = trans_dic.eq(str(value)).any(axis=1)
        if len(pos[pos]) > 0 : 
            new_array.append(trans_dic['sku_0'][pos[pos].index[0]])
        else: 
            missing_values.add(str(value))
    if len(missing_values) > 0 :         
        print("The following values are missing in dictionary. They are in DF called:"+dfarray1)
        print(missing_values)
        sys.exit()
    else: 
        return new_array

I'm sure that this is very badly written because it takes my laptop about 3 minutes to go through about 75K values only. Can anyone help me make this faster?

Some questions asked previously:

What types are your function parameters? (can guess pandas, but no way to know for sure)

Yes. I am working on two pandas dataframes.

What does your table even look like?

Dictionary table:

SKU0	Synonym 0	Synonym 1	Synonym 2
foo	bar	bar1
foo1	baar1
foo2	baaar0		baar2

Values table:

SKU	Value	Value1	value1
foo	3	1	7
baar1	4	5	7
baaar0	5	5	9

Desired table:

SKU	Value	Value1	value1
foo	3	1	7
foo1	4	5	7
foo2	5	5	9

What does the rest of your code that is calling this function look like?

df1.sku = new_array('df1', list(df1.sku), sku_dic)

score 0 · Accepted Answer · answered Feb 09 '21 at 08:58

0

Given the dictionary dataframe in the format

df_dict = pd.DataFrame({
    "SKU0": ["foo", "foo1", "foo2"],
    "Synonym 0": ["bar", "baar1", "baaar0"],
    "Synonym 1": ["bar1", np.nan, np.nan],
    "Synonym 2": [np.nan, np.nan, "baar2"]
})

and a values dataframe in the format

df_values = pd.DataFrame({
    "SKU": ["foo", "baar1", "baaar0"],
    "Value": [3, 4, 5],
    "Value1": [1, 5, 5],
    "value1": [7, 7, 9]
})

you can get the output you want by first using pd.melt to restructure your dictionary dataframe and then join it to your values dataframe. Then you can use some extra logic to check which column to take the final value from and to select the final columns needed.

(
    df_dict
    # converts dict df from wide to long format
    .melt(id_vars=["SKU0"])
    # filters rows where there is no synonym
    .loc[lambda x: x["value"].notna()]
    # join dictionary with values df
    .merge(df_values, how="right", left_on="value", right_on="SKU")
    # get final value by taking the value from column "SKU0" if available, else "SKU"
    .assign(SKU = lambda x: np.where(x["SKU0"].isna(), x["SKU"], x["SKU0"]))
    # select final columns needed in output
    [["SKU", "Value", "Value1", "value1"]]
)

# output
    SKU     Value   Value1  value1
0   foo     3       1       7
1   foo1    4       5       7
2   foo2    5       5       9

answered Feb 09 '21 at 08:58

Oxbowerce

430
5
14

@mixcocam Even if the tables are of different lenghts this approach will work, the only thing you might need to change is how the two tables are merged depending on how you want to treat these cases. – Oxbowerce Feb 09 '21 at 17:31
Thank you @oxbowerce. Is there any way to easily extract the SKUs that are not in my dictionary using your method? – mixcocam Feb 13 '21 at 09:03
Yes, if you change the merge type to `"outer"` and the argument `argument=True` you can see which records are in which table, see also [this answer](https://stackoverflow.com/a/38242463/9435355). – Oxbowerce Feb 13 '21 at 11:04
Do you know if this is possible using SQL? It would be great to achieve this transformation directly using sqlite. – mixcocam Oct 28 '21 at 18:12
This is definitely possible using SQL. While normally you could use `UNPIVOT` to convert the data from wide to long format, SQLite doesn't have this operator so you need to use several `SELECT` statements with a `UNION ALL` in between. See [this sqlfiddle example](http://sqlfiddle.com/#!7/5193d/8). – Oxbowerce Oct 29 '21 at 08:28

Translate my SKUs using a dictionary with Pandas

1 Answers1