2

I have the following Dataframe containing various product names and looks like this:

                                       Name  
0                            1 Enkelt (35%) 
1                          1 Klasses Bitter  
2               1 minute Urban Protect Mask  
3                       10 Years Tawny Port 
4                             100% Frugtbar  
5                       100% Klementinjuice  
6                            100% Kokosvand
7                    1000 kernerugbrød øko. 

See this product: 1000 kernerugbrød øko.. I am trying to put some conditions so that I remove the oko. from the end, and based on the Danish language rules regarding singular and plural, add either "Økologisk" (singular) or "Økologiske" (plural) in front of the name. In this case, because kernerugbrød does not end with the letter r, it should be Økologisk.

So basically the idea is like this:

I have a row containing this value in the Name column: 1000 kernerugbrød øko. -> I remove the oko., resulting into 1000 kernerugbrød -> I check whether the last letter is r or not -> Add Økologisk or Økologiske depending on the previous step -> Final string should then be: Økologisk 1000 kernerugbrød.

My attempt was the following:

text = "Økologisk "
text2 = "Økologiske "

df['test'] = df['Name'].str.contains(",?\søko.") #creating a new column containing 
                                 #booleans to check which Name contains "oko."

df['Name'] = df['Name'].str.replace(r',?\søko.', "") #replacing "oko." with empty string

if df['test']: #if the Name contained "oko."
    if df['Name'].str.contains("r(\s)?$"): #checking for plural
        df['Name'] = text2 + df['Name']
    else:
        df['Name'] = text + df['Name']

However, I am getting this error at if df['test'].

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried using the suggestions above but none of them actually helps me with this task. Therefore, what should I do to fix my code OR how else should my code be written in order to achieve a correct solution for this problem?

Questieme
  • 913
  • 2
  • 15
  • 34

1 Answers1

2

I think you can use double numpy.where:

m1 = df['Name'].str.contains(",?\søko.") #creating a new column containing 
                                 #booleans to check which Name contains "oko."

df['Name'] = df['Name'].str.replace(r',?\søko.', "") #replacing "oko." with empty string

m2 = df['Name'].str.contains("r(\s)?$")

df['Name'] = np.where(~m1, df['Name'],
             np.where(m2, text2, text) + df['Name'])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Awesome! I actually tried figuring if I could use np.where for this but after a while I thought that the syntax wouldn't allow me to accomplish this. Your code did the job, however. Really appreciate the help! – Questieme Sep 30 '19 at 12:45
  • Excuse me, is `~` an operator in pandas? I feel like asking a question about it, but I'm not sure. – Celius Stingher Sep 30 '19 at 17:56
  • 1
    @CeliusStingher - It is logical AND opearator for boolean mask, check [this](https://stackoverflow.com/questions/15998188/how-can-i-obtain-the-element-wise-logical-not-of-a-pandas-series) – jezrael Oct 01 '19 at 04:46