-1

I would like to read rows of data from .csv file and whenever i run my code it pops out this error. I have no idea how to solve this problem. I've looked up to some similar posts but still cant be solved.

Here is my code:

def getThreshold(dataSet, Attributes, isNumeric):
    '''
        Calculates median threshold from train dataset
    '''
    thresholds = []
    for x in Attributes:
        indx = Attributes.index(x)
        numeric = isNumeric[indx]
        if numeric:
            listAtt = []
            for row in dataSet:
                listAtt.append(float(row[indx]))     
            # calculate median a numeric attribute column
            median = statistics.median(listAtt)
            thresholds.append(median)
    return thresholds

Here is my sample data, (without quotes)

41,management,single,secondary,no,764,no,no,cellular,12,jun,230,2,-1,0,unknown,no
39,blue-collar,married,secondary,no,49,yes,no,cellular,14,may,566,1,370,2,failure,no
60,retired,married,primary,no,0,no,no,telephone,30,jul,130,3,-1,0,unknown,no
31,entrepreneur,single,tertiary,no,247,yes,yes,unknown,2,jun,273,1,-1,0,unknown,no

The problem is found in the first column which is the age, are identified as string. Is it the problem in csv file or code?

Mandera
  • 2,647
  • 3
  • 21
  • 26
John Khor
  • 15
  • 7
  • Please reduce and enhance this into the expected [MRE](https://stackoverflow.com/help/minimal-reproducible-example). – Prune May 26 '20 at 06:09

1 Answers1

0

Casting variables to float can raise a ValueError if it's not possible. Changed your code to check for that.

def getThreshold(dataSet, Attributes, isNumeric):
    '''
        Calculates median threshold from train dataset
    '''
    thresholds = []
    for x in Attributes:
        indx = Attributes.index(x)
        numeric = isNumeric[indx]
        if numeric:
            listAtt = []
            for row in dataSet:
                value = row[indx]
                # Try to convert value to float, if it fails then it keeps the original type
                try:
                    value = float(value)
                except ValueError:
                    pass
                listAtt.append(value)
            # calculate median a numeric attribute column
            median = statistics.median(listAtt)
            thresholds.append(median)
    return thresholds

As a side note: All variables should start with lower-case. Only class definitions should start with upper-case.

Mandera
  • 2,647
  • 3
  • 21
  • 26
  • I have identified the error. The first column of the data is identified as a string. May I know how to solve it? Is it the problem in code or in the .csv file? – John Khor May 26 '20 at 06:35
  • If you are using pandas dataframe then you can convert the type easily, see https://stackoverflow.com/questions/16729483/converting-strings-to-floats-in-a-dataframe. You should also try to not iterate a dataframe, see here https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas. Basically I recommend trying to rewrite it so you get a dataframe directly from reading the csv file, then use the built-in methods from panda to get medians and such. – Mandera May 26 '20 at 06:40
  • the only problem i'm having now is that the first column is recognized as a string instead of numeric, there are no problems for numeric in other columns. I tried changing the format of columns in excel but changes made cant be saved. May I know how to solve this? – John Khor May 26 '20 at 07:48
  • Did you try `pd.Series.astype(float)` or `pd.to_numeric` as the links I posted explained? – Mandera May 26 '20 at 07:50
  • I've tried but still the same. I'm getting \ufeff43 instead of number 43 – John Khor May 26 '20 at 08:08
  • Ok i might have solved the issue by saving my .csv file as normal csv instead of with encoding UTF-8. Thank you so much for the help! – John Khor May 26 '20 at 08:11
  • Oh okay awesome, yeah I just found the same thing, seems to be encoding issue: https://stackoverflow.com/questions/17912307/u-ufeff-in-python-string You're welcome! Glad I could help – Mandera May 26 '20 at 08:12