-1

I have a .csv file from perfmon. The file has 6000 records and look like this:

(PDH-CSV 4.0) (SA Pacific Standard Time)(300),"\\server1\PhysicalDisk(_Total)\% Disk Read Time","\\server1\PhysicalDisk(_Total)\% Disk Write Time"
10/30/2017 15:00:15.568," "," "
10/30/2017 15:00:30.530,"25.763655942362824","130.21748494987176"
10/30/2017 15:00:45.518,"25.591636684958058","135.81093813384427"

I need to get min, max and 95 percentile from 1 and 2 column. However, as a newbie, I'm not able to pass the first challenge which consist in formatting every single value to int:

import csv
sum = 0
fila = 0

with open('datos_header.csv') as csvfile:
    leercsv = csv.reader(csvfile, delimiter = ',')
    csvfile.__next__()
    for col in leercsv:
        col1 = (col[1])
        subtot = float(col1 * 4)
#        fila = fila + 1
#        sum = col1 + float(col)

#tot = sum / fila
    print(subtot)

and get:

Traceback (most recent call last): File "", line 10, in ValueError: could not convert string to float:

I've tried: - removing the header - removing every single non-numeric like / or : values using regex - removing empty blanks

Having said that:

  1. Besides the error, do you think I'm on the right path to get min, max and 95 percentil?
  2. If so, what needs to be done to convert string to float as per my code?
  3. If not, would you please assist?

Thank you!

HelloWorld
  • 77
  • 1
  • 8
  • can you print the record that is failing? that will give us the most amount of information in regards to why you are getting the `ValueError` – MattR Nov 09 '17 at 21:40
  • 1
    If you want a built-in solution, you can use [pandas](https://pandas.pydata.org) to read the csv file and then use the quantile function as described [here](https://stackoverflow.com/questions/39581893/pandas-find-percentile-stats-of-a-given-column) – Daniel Lenz Nov 09 '17 at 21:41
  • 1
    If you multiply a string like e.g. "2.5" with 4 it results in "2.52.52.52.5" which isn't a float. – Michael Butscher Nov 09 '17 at 21:41
  • 2
    Please do not post your entire homework assignment as a question. Focus your question on a specific issue. For example, you could post just line 10 of your code and the exception you are getting, and ask how to convert string to float properly. – jprusakova Nov 09 '17 at 21:42
  • 2
    The second line of your file is `10/30/2017 15:00:15.568," "," "`. The last two columns are spaces which cannot be converted to float. Wrap your calls to `float` in try-except and then deal with those situations in an except clause. You may want to skip those rows, in which case you can `continue`. Or you can set a default value in those cases. – Steven Rumbalski Nov 09 '17 at 21:42
  • 1
    Also `subtot = float(col1 * 4)` won't work for valid string representations of a float. You probably want `subtot = float(col1) * 4`. – Steven Rumbalski Nov 09 '17 at 21:45

1 Answers1

1

You have to check the string to float conversion first so you may try:

for col in leercsv:
    col1 = (col[1])
    if col1: subtot = float(col1) * 4 # and convert to float before multiply

More robust solution:

for col in leercsv:
    col1 = (col[1])
    try: subtot = float(col1) * 4
    except: pass
efirvida
  • 4,592
  • 3
  • 42
  • 68
  • Thank you efirvida, it worked adding the error handling stuff. I suppose that it confirms there is a string - spaces like Steven Rumbalksi mentioned -. Could my assumption be correct? – HelloWorld Nov 09 '17 at 23:45
  • @HelloWorld, yes the try except skip any error that occurs inside it, so if the float conversion run trough exception, the code wikk execute the except part in this case pass mean nothing – efirvida Nov 10 '17 at 01:28