-2

I want to take the string 'APPLES_10_4' inside a dataframe and have it become 'APPLES'. The code I have come up with is below:

import pandas as pd
data = ['APPLES_10_4']

Name_Parameters = []
df = pd.DataFrame(data, columns = ['fruit'], index = ['count'])

    
def badletters(lastletter):
    badletters = ["1","2","3","4","5","6","7","8","9","_"]
    if lastletter in badletters:
        return True
    else:
        return False   

def stripe(variable):
    tempStrippedVariable = variable
    foundEndVariable = False
    while not foundEndVariable:
        lastletter = tempStrippedVariable [:-1]
        if badletters(lastletter):
            tempStrippedVariable = tempStrippedVariable [:-1]
        else:
            foundEndVariable = True
    strippedVariable = tempStrippedVariable
    return strippedVariable

for variable in df:
strippedVariable = stripe(str(variable))
prefixes = []
if strippedVariable not in prefixes:
    prefixes.append(strippedVariable)
print(df)

The output I am getting is the original dataframe with ['APPLES_10_4'] not the altered one that says ['APPLES'].

Nimantha
  • 6,405
  • 6
  • 28
  • 69

1 Answers1

0

Some of the dataframe elements are integers, not strings. You can convert them to strings before calling stripe()

for variable in df:
    strippedVariable = stripe(str(variable))
    if strippedVariable not in prefixes:
        prefixes.append(strippedVariable)
print(prefixes)

Or you could just skip them.

for variable in df:
    if not isinstance(variable, str):
        continue
    strippedVariable = stripe(variable)
    if strippedVariable not in prefixes:
        prefixes.append(strippedVariable)
print(prefixes)

Another bug is in stripe():

lastletter = tempStrippedVariable [:-1]

should be

lastletter = tempStrippedVariable [-1]

You're setting lastletter to the entire string except the last letter.

But that whole function can be replaced with simply:

def stripe(variable):
    badletters = ["1","2","3","4","5","6","7","8","9","_"]
    return variable.rstrip(badletters)

Finally, for variable in df doesn't iterate over the dataframe contents, just the column names. See How to iterate over rows in a DataFrame in Pandas

for row in df.itertuples():
    variable = row[0]
    strippedVariable = stripe(variable)
    if strippedVariable not in prefixes:
        prefixes.append(strippedVariable)
Barmar
  • 741,623
  • 53
  • 500
  • 612