2

Some background: My code takes user input and applies it to my DF to remove certain rows. This process repeats as many times as the user would like. Unfortunately, I am not sure how to update my DF within the while loop I have created so that it keeps the changes being made:

data = ({'hello':['the man','is a','good guy']})
df = pd.DataFrame(data)

def func():
    while True:
        n = input('Words: ')
        if n == "Done":
            break  
        elif n != "Done":
            pattern = '^'+''.join('(?=.*{})'.format(word) for word in n.split())
            df[df['hello'].str.contains(pattern)==False]

How do I update the DF at the end of each loop so the changes being made stay put?

user3682157
  • 1,625
  • 8
  • 29
  • 55
  • use `loc`: `df.loc[df['hello'].str.contains(pattern)==False, 'col'] = newVal` – EdChum Sep 30 '14 at 07:18
  • unsure of how this code works? can you please explain a little more if you don't mind! – user3682157 Sep 30 '14 at 14:26
  • `loc` uses label based indexing see the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing-loc-iloc-and-ix – EdChum Sep 30 '14 at 15:17
  • Your code throws an error value for me – user3682157 Sep 30 '14 at 15:29
  • And the error is.... – EdChum Sep 30 '14 at 15:39
  • Ha, sorry that would have been helpful! its a Name Error: name 'newVal' is not defined – user3682157 Sep 30 '14 at 15:48
  • Sorry you misunderstand, my code snippet was just an example, your code seems incomplete as you are talking about assigning values but I don't see where this occurs – EdChum Sep 30 '14 at 15:50
  • Hi Ed -- There assigning values comes by means of the user inputted string: So for example if n = the man, that row in the DF gets removed by means of a regex. My need is to then update the DF so that row is permanently removed because there may be multiple user inputs for rows to be taken out! – user3682157 Sep 30 '14 at 15:55
  • You can replace your wacky `elif` construction with a simple `else` in this case. (Or, in fact, with nothing at all. The `break` takes care of the else for you.) Go ahead and do it, it'll stop future readers of this question being distracted by it. – LondonRob Jun 30 '15 at 10:30

2 Answers2

0

Ok, I reevaluated your problem and my old answer was totally wrong of course.

What you want is the DataFrame.drop method. This can be done inplace.

mask = df['hello'].str.contains(pattern)
df.drop(mask, inplace=True)

This will update your DataFrame.

firelynx
  • 30,616
  • 9
  • 91
  • 101
0

Looks to me like you've already done all the hard work, but there are two problems.

  1. Your last line doesn't store the result anywhere. Most Pandas operations are not "in-place", which means you have to store the result somewhere to be able to use it later.

  2. df is a global variable, and setting its value inside a function doesn't work, unless you explicitly have a line stating global df. See the good answers to this question for more detail.

So I think you just need to do:

df = df[df['hello'].str.contains(pattern)==False]

to fix problem one.

For problem two, at the end of func, do return df then when you call func call it like:

df = func(df)

OR, start func with the line

global df
Community
  • 1
  • 1
LondonRob
  • 73,083
  • 37
  • 144
  • 201