1

I have a large df that has this structure:

data = pd.DataFrame({'a': ['red', 'blue', 'green', 'cat', 'dog'],
                     'b': [1, 1, 2, 3, 3]})

I have a dict that assigns categories like so:

    category_dict = {'red': ['color'],
 'blue': ['color'],
 'green': ['color'],
 'cat': ['animal'],
 'dog': ['animal']}

I want to use the dict to create another column with the categories:

data_update = pd.DataFrame({'a': ['red', 'blue', 'green', 'cat', 'dog'],
                     'b': [1, 1, 2, 3, 3],
                    'c': ['color', 'color', 'color', 'animal', 'animal']})

I thought data['c'] = category_dict[data['a']] would give my output, but instead I get the error 'Series' objects are mutable, thus they cannot be hashed

Liquidity
  • 625
  • 6
  • 24
  • Why are your values inside lists? Wouldn't it be easier to just keep them strings and use `map`? – cs95 Jun 18 '19 at 16:49
  • Anyway, the answer to your question is `data.a.map(category_dict).str[0]` – cs95 Jun 18 '19 at 16:50

2 Answers2

2

Try:

flatten_dict = {k:v[0] for k,v in category_dict.items()}

data['c'] = data['a'].map(flatten_dict)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Use this:

data['c'] = [category_dict[x][0] for x in list(data['a'])]

Nole
  • 119
  • 1
  • 3
  • 14