I have a large dataframe containing a 'Description'
column.
I've compiled a sizeable dictionary of lists, where the key is basically the Category, and the items are lists of possible (sub)strings contained in the description column.
I want to use the dictionary to classify each entry in the dataframe based on this description... Unfortunately I can't figure out how to apply a dictionary of lists to map to a dataframes (feels like it would be some sort of concoction of map
, isin
and str.contains
but I have had no joy). I've included code to generate a model dataset below:
df = pd.DataFrame(np.random.randn(10, 1), columns=list('A'))
df['Description'] = ['White Ford Escort', 'Irish Draft Horse', 'Springer \
spaniel (dog)', 'Green Vauxhall Corsa', 'White Van', 'Labrador dog',\
'Black horse' ,'Blue Van','Red Vauxhall Corsa','Bear']
This model dataset would then ideally be somehow mapped against the following dictionary:
dict = {'Car':['Ford Escort','Vauxhall Corsa','Van'],
'Animal':['Dog','Horse']}
to generate a new column in the dataframe, with the result as such:
| | A | Description | Type |
|---|----------------------|------------------------|--------|
| 0 | -1.4120290137842615 | White Ford Escort | Car |
| 1 | -0.3141036399049358 | Irish Draft Horse | Animal |
| 2 | 0.49374344901643896 | Springer spaniel (dog) | Animal |
| 3 | 0.013654965767323723 | Green Vauxhall Corsa | Car |
| 4 | -0.18271952280002862 | White Van | Car |
| 5 | 0.9519081000007026 | Labrador dog | Animal |
| 6 | 0.403258571154998 | Black horse | Animal |
| 7 | -0.8647792960494813 | Blue Van | Car |
| 8 | -0.12429427259820519 | Red Vauxhall Corsa | Car |
| 9 | 0.7695980616520571 | Bear | - |
The numbers are obviously irrelevant here, but there are other columns in the dataframes and I wanted this reflecting. I'm happy to use regex, or a perhaps change my dictionary to a dataframe and do a join (i've considered multiple routes).
This feels similar to a recent question, but it's not the same and certainly the answer hasn't helped me.
Sorry if I've been stupid somewhere and this is really simple - it does feel like it should be, but i'm missing something.
Thanks