1

I need to replace parts of the text in a dataframe in python. However, the replacement string should be chosen from a big list of multi-word strings. I have written the following simple example for demonstration of the problem and my solution with a for loop. It works well but if the list of words and the dataframe are huge, the for loop becomes very expensive to run. I was wondering if there is any way to avoid the for loop here.

    text = ['I am north and west','you are east and south']
    df = pd.DataFrame(text)

    def loop_names(str):
        words = ['north and west','east and south']
        for word in words:
            str = re.sub(r'%s' %re.escape(word),'at location',str)
        return(str)
    df[0] = df[0].apply(loop_names)        
    df

    # Alternatively:

    text = ['I am north and west','you are east and south']
    df = pd.DataFrame(text)
    words = ['north and west','east and south']
    for word in words:
        df[0] = df[0].str.replace(r'%s' %re.escape(word),'at location')
    df

    # Alternatively:

    text = ['I am north and west','you are east and south']
    df = pd.DataFrame(text)
    words = ['north and west','east and south']
    for word in words:     
        df[0] = df[0].apply(lambda x: re.sub(r'%s' %re.escape(word),'at 
    location',x))
    df
mmsm
  • 11
  • 1

0 Answers0