0

I have written a function in python , but I am passing string to a function as a parameter, but I have a excel file that is Dataframe which has many rows now i want to process each row of a column as a string .How do i do that ?

I have written the following function which takes the string as a input no wi want o pass dataframe to the function, how do i do that?

def pre_process(utterance):
    utterance = remove_name(utterance)
    utterance = text_in_next_line_after_dot(utterance)
    utterance = convert_num_to_words(utterance)
    utterance = remove_stop_phrase(utterance)
    utterance = remove_character(utterance)
    utterance = remove_blank_lines(utterance)
return utterance.strip()

Dataframe looks like this

id         Utterance
1    my name is cyley . I am at post91
2    after 24 hours you need to send the email
3    there interaction id is 123456
4   he is studying at masters school

I have this kind of dataframe. I want to using utterance column as a string in the above function

Rozakos
  • 608
  • 2
  • 6
  • 20
Sayli Jawale
  • 159
  • 1
  • 18
  • 2
    just look for the `apply` function. see it working here: https://stackoverflow.com/questions/16353729/why-isnt-my-pandas-apply-function-referencing-multiple-columns-working – MEdwin Jun 24 '19 at 10:58
  • i TRIED THIS BUT THIS IS NOT WORKING – Sayli Jawale Jun 24 '19 at 11:03
  • okay, if you present a sample data (similar with what you are working with) and what you expect it to be with your python code/function, we could try to see there is a way we can improve on it. – MEdwin Jun 24 '19 at 11:08
  • @MEdwin I have updated the question, have added the sample data . I just want to know how to I pass the whole data to my function pre_process . asmy function takes string as input – Sayli Jawale Jun 24 '19 at 11:22
  • Doesn't map work in this example? your_df['Utterance_preprocessed']=your_df['Utterance'].map(pre_process) – jottbe Jun 24 '19 at 11:37

1 Answers1

0

See a mockup. basically you are updating a dataframe column with the logic in the function (remove_numbers: this remove all numbers from the utterance column). Let me know if it works.

import pandas as pd
import re

df = pd.DataFrame({'id': [1,2,3,4],
                  'Utterance': [
                      'my name is cyley . I am at post91', 
                      'after 24 hours you need to send the email', 
                      ' there interaction id is 123456', 
                      'he is studying at masters school']})
def remove_numbers(s):
    return re.sub(r'\d+', '', s)



def pre_process():
    df['Utterance'] = df['Utterance'].apply(remove_numbers)
    #utterance = text_in_next_line_after_dot(utterance)
    #utterance = convert_num_to_words(utterance)
    #utterance = remove_stop_phrase(utterance)
    #utterance = remove_character(utterance)
    #utterance = remove_blank_lines(utterance)
    return None

pre_process()

df

result below:

Utterance   id
0   my name is cyley . I am at post 1
1   after hours you need to send the email  2
2   there interaction id is 3
3   he is studying at masters school    4
MEdwin
  • 2,940
  • 1
  • 14
  • 27