2

I've been trying to iterate through strings in a pandas dataframe to look for a certain set of words and here I've been successful.

However, I realised that I didn't just want to find words but also look at the semantics of a word and group together a certain set of words that bare the same meaning as my main keyword.

I stumbled upon the following question How to return key if a given string matches the keys value in a dictionary which is exactly what I want to do but unfortunately can’t get it to work in a pandas dataframe.

Below is one of the solutions which can be found in the link:

my_dict = {"color": ("red", "blue", "green"), "someothercolor":("orange", "blue", "white")}

solutions = []

my_color = 'blue'

for key, value in my_dict.items():
    if my_color in value:
        solutions.append(key)

Outputs:

color

My data frame:

Now I have a data frame where I would like to iterate through df[’Name’] to find a value and then I want to add the key to a new column. In this example it would be df[‘Colour']

+---+----------+--------------------------+-----------------------------+----------+--------+
|   |   SKU    |           Name           |         Description         | Category | Colour |
+---+----------+--------------------------+-----------------------------+----------+--------+
| 0 | 7E+10    | Red Lace Midi Dress      | Red Lace Midi D...          | Dresses  |        |
| 1 | 7E+10    | Long Armed Sweater Azure | Long Armed Sweater Azure... | Sweaters |        |
| 2 | 2,01E+08 | High Top Ruby Sneakers   | High Top Ruby Sneakers...   | Shoes    |        |
| 3 | 4,87E+10 | Tight Indigo Jeans       | Tight Indigo Jeans...       | Denim    |        |
| 4 | 2,2E+09  | T-Shirt Navy             | T-Shirt Navy...             | T-Shirts |        |
+---+----------+--------------------------+-----------------------------+----------+--------+

Expected result:

+---+----------+--------------------------+-----------------------------+----------+--------+
|   |   SKU    |           Name           |         Description         | Category | Colour |
+---+----------+--------------------------+-----------------------------+----------+--------+
| 0 | 7E+10    | Red Lace Midi Dress      | Red Lace Midi D...          | Dresses  | red    |
| 1 | 7E+10    | Long Armed Sweater Azure | Long Armed Sweater Azure... | Sweaters | blue   |
| 2 | 2,01E+08 | High Top Ruby Sneakers   | High Top Ruby Sneakers...   | Shoes    | red    |
| 3 | 4,87E+10 | Tight Indigo Jeans       | Tight Indigo Jeans...       | Denim    | blue   |
| 4 | 2,2E+09  | T-Shirt Navy             | T-Shirt Navy...             | T-Shirts | blue   |
+---+----------+--------------------------+-----------------------------+----------+--------+

My code:

colour = {'red': ('red', 'rose', 'ruby’), ‘blue’: (‘azure’, ‘indigo’, ’navy')}

def fetchColours(x):
    for key, value in colour.items():
            if value in x:
                return key
            else:
                return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

I get the following error:

TypeError: 'in <string>' requires string as left operand, not tuple

I can't run a tuple against string. How would I approach this?

jpp
  • 159,742
  • 34
  • 281
  • 339
Bob Harris
  • 77
  • 6

2 Answers2

1

You need to loop through each value in the dictionary key tuple values.

As per the error message, you cannot check whether a tuple exists in a str type.

In addition, make sure your else statement occurs after the outer for loop, so that all keys are tested before you output the default value.

Finally, make sure you check versus str.lower(), since string matching is case sensitive in Python.

import pandas as pd

df = pd.DataFrame({'Name': ['Red Lace Midi Dress', 'Long Armed Sweater Azure',
                            'High Top Ruby Sneakers', 'Tight Indigo Jeans',
                            'T-Shirt Navy']})

colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

def fetchColours(x):
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                return key
    else:
        return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

Result:

                       Name Colour
0       Red Lace Midi Dress    red
1  Long Armed Sweater Azure   blue
2    High Top Ruby Sneakers    red
3        Tight Indigo Jeans   blue
4              T-Shirt Navy   blue
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thank you so much for the thorough explanation, it works as excepted. Let's say that i had more than one value that matched the df['Name'] column. Example: Blue and Red Lace Midi Dress. Would it be easy to store these in the same cell? (blue,red). – Bob Harris Apr 01 '18 at 06:52
  • It's possible. I suggest you have a go yourself. If you get stuck, you can ask a separate question. – jpp Apr 01 '18 at 10:12
0

You are trying to search a tuple of words inside a string, while I guess you want to check if any word of the tuple is in the string.

BTW strings are case sensitive in python.

You could replace :

if value in x: 

by

if any(word in x.lower() for word in value):
Guillaume
  • 5,497
  • 3
  • 24
  • 42