0

I have a dataframe "result" and want to create a new column called "type". The value in "type" will be the item value of a dict if the column "Particulars" in the dataframe contains value of the key.

dict_classify={'key1': 'content1', 
           'key2':'content2'
          }

result['type']=[dict_classify[key] if key.lower() in i.lower() else np.nan 
                for key in dict_classify.keys() 
                for i in result['Particulars']]

It returns the error "Length of values (5200) does not match the length of index (1040)". Any idea what I did wrong?

The following is what I want to achieve in a normal for loop. Can I make it into one line?

lst_type=[]

for i in result['Particulars']:
    for key in dict_classify:
        temp=np.nan
        if key.lower() in i.lower():
            temp=dict_classify[key]
            break
    
    lst_type.append(temp)


result['type']=lst_type
mogcai
  • 29
  • 1
  • 6
  • 1
    The length of the array you generated is equal to `[the number of keys in dict_classify] * [length of result]` (length of a dataframe is how many rows it has), but what you need is an array whose length is the length of result. Seems like you're using a nested for loop where you should be doing something else. – Ben Grossmann Oct 21 '22 at 05:11
  • Can you show some example data in `result["Particulars"]`? – Stuart Oct 21 '22 at 07:24
  • For example, if the string "I have key1" is in `result["Particulars"]`, the correspondent cell in `result["type"]` will be "content1". – mogcai Oct 21 '22 at 07:27

1 Answers1

1

The most straightforward way is probably to iterate through the dictionary using loc to find cells that contain each key:

for key, value in dict_classify.items():
    result.loc[result["Particulars"].str.contains(key), "type"] = value

You could also use a regex to identify the matched keys (like this answer). We can then use replace to get the values corresponding to each key.

regex = "(" + "|".join(dict_classify) + ")"
result["type"] = result["Particulars"].str.extract(regex).replace(dict_classify)

(You could of course condense this to one line if you really want to.)

Stuart
  • 9,597
  • 1
  • 21
  • 30