-1

I'm currently working with a pandas dataset (US startups) and am trying to aggregate sectors by keywords. In other words, I need to loop through a column and if a value contains a given string, replace the whole value with a new string.

If already tried some simple "if" statement loops, but can't seem to get the syntax right. I've also tried some .loc, but all I can seem to do is replace all values of the column with one string.

Thanks!

Tommaso
  • 9
  • 2

2 Answers2

0

A simple way to do this is store the mappings of sectors to sector categories as a dictionary, and then apply a function that calls that mapping.

import pandas as pd

data = pd.DataFrame(["chunky spam", "watery spam", "hard-boiled", "scrambled"])

def mapping(sector):
    mapping_dict = {"chunky spam": "spam", 
                    "watery spam": "spam", 
                    "hard-boiled": "eggs", 
                    "scrambled": "eggs"}

    return mapping_dict[sector]

data[0].apply(mapping)
Kevin Troy
  • 412
  • 4
  • 13
0

You can accomplish this using pd.DataFrame.where():

df.where(df.column_name != "str", "replace")

Based on the formulation of the df.where() method, it will replace all of the values that DO NOT match the condition. This is why we use the negated != when looking for "str" in some column. All instance which are equal to "str" will be replaced with the string "replace"

Philip Ciunkiewicz
  • 2,652
  • 3
  • 12
  • 24