0

I have the following line

open.loc[(open['Genesis'] == 'Incomplete Name') & (open['Date'] >= last_date),'Result'] = 'Missing information'
  • 'last_date' is a variable equal to a specific date week (2022-25)
  • 'open' is my dataframe
  • 'Result' is a column of my dataframe, as well as 'Date' and 'Genesis'

What I'm trying to do is to transform that line of code into something where instead of specifying the value of 'Incomplete' in the ['Genesis'], I could have something like the 'LIKE' operator in SQL, to get more information, since the ['Genesis'] column has values like:

  • 'Incomplete Name'
  • 'Incomplete color'
  • 'Incomplete material'

And to replace the ['Result'] value with the ['Genesis'] value itself, since I do not want to manually specify every possible outcome of that column.

I've tried something like the following but I'm struggling to make it work:

 `for word in open['Genesis']:
    if word.startswith('Incomplete') and open['Date'] >= last_date:
      open['Result'] = open['Genesis']`

Thanks in advance!

  • I'm not sure, but maybe this other SO thread will help? : [pandas select from Dataframe using startswith](https://stackoverflow.com/q/17957890) – Savir Oct 26 '22 at 17:30
  • 3
    `open` should definitely NOT be the name of your dataframe. `open` is a python built-in for reading/writing files. You essentially stripped python of it's `open` features by naming your dataframe that. – OneMadGypsy Oct 26 '22 at 17:41
  • In the general case, you probably want regular expressions. They are more complex than SQL "like" statements but also significantly more powerful and versatile. – tripleee Oct 26 '22 at 18:00

1 Answers1

0

You can use the in operator in python. Basically the expression returns true if the incomplete value is "subset" of the other complete value.

So you can do:

open.loc[('Incomplete Name' in open['Genesis']) & (open['Date'] >= last_date),'Result'] = 'Missing information'

** EDIT ** try this one:

for word in words:
  if word in open['Genesis'] and open['Date'] >= last_date:
    open['Result'] = open['Genesis']`
Shoaib Amin
  • 207
  • 1
  • 4
  • Thanks! But that only solves part of my issue, since I want to have the 'Incomplete...' values as a result in the 'Result' column, since they are different types of incomplete messages as I mentioned – exenlediatán Oct 26 '22 at 17:30
  • Please take a look at my edited answer. If you do not have a word array then you can skip the first line of the code (for word in words:) – Shoaib Amin Oct 26 '22 at 17:34
  • You are shadowing the `open` built-in. Your answer should change that and mention why you changed it. – OneMadGypsy Oct 26 '22 at 17:40
  • Hey @ShoaibAmin, I've tried the second option and it does not work, Try to find values in the 'Genesis' column that starts with 'Incomplete' and those that starts with 'Incomplete', in the column called 'Result', display the value that they have in the 'Genesis' column. – exenlediatán Oct 26 '22 at 20:35