0

Here is my code snippet to get the data I need from the CSV:

            pathName = 'pathName'
            export = pd.read_csv(pathName, skiprows = [0], header = None)
                #pathName: Find the correct path for the file
                #skiprows: The first row is occupied for the title, we dont need that
            omsList = export.values.T[1].tolist() #Transpose the matrix + get second path
            for omsID in omsList:
                productOMS = omsID

Here is how I'm yielding said item:

item['productOMS'] = productOMS
yield item

Here is the column I am trying to get data from

enter image description here

When I run my spider I get nan as the output for omsID, which after research I found out means not a number. It would make sense why I'm getting that since I think they would be considered strings so how would I adjust my program to recognize these data fields as strings and not ints or read them in as ints?

chrisHG
  • 80
  • 1
  • 2
  • 18

2 Answers2

1

you need to use pythons type conversion / casting - i.e int(my_numerical_string) tells python to interpret the text as an integer. you can also use type(my_var) to find out the type of your variable

dryliketoast
  • 153
  • 1
  • 7
  • Got the error: ValueError: cannot convert float NaN to integer when I did productOMS = int(omsID) then I tried productOMS = float(omsID) got nan again – chrisHG Mar 07 '20 at 00:18
  • you want to cast it as a `float` as in a floating point number like this `productOMS = float(omsID)` – dryliketoast Mar 07 '20 at 00:20
  • i did and it returned nan – chrisHG Mar 07 '20 at 00:22
  • Ignore that... I think this is your problem https://stackoverflow.com/questions/47333227/pandas-valueerror-cannot-convert-float-nan-to-integer - are there holes in your csv data? (empty cells) – dryliketoast Mar 07 '20 at 00:23
  • I tried a few variations of the link and I would get the error float object has no attribute [] which I tried fixing by flooring the value to create an int and some other stuff which returned the same error code – chrisHG Mar 07 '20 at 00:41
  • also there are no holes in the csv file – chrisHG Mar 07 '20 at 00:41
  • at this point i would start printing out the values and look at whats going on – dryliketoast Mar 07 '20 at 00:44
  • I'm reusing this code from earlier in my program start_urls = ['https://www.homedepot.com/p/{omsID}'.format(omsID = omsID) for omsID in omsList] do you know why it would be printing out nan here but print out fine in this snippet – chrisHG Mar 07 '20 at 00:47
0

This was a silly problem that I did not see coming. I have to increase the width of the target column in excel so the values could actually be read in.

chrisHG
  • 80
  • 1
  • 2
  • 18