0

I'm iterating through rows of a dataframe to extract values as follows but what I receive is always a float value and I'm not able to convert to int for both result["YEAR_TORONTO"] and result["YEAR_TORONTO2"]

for i in range(0, len(result)):
    if result["SOURCE_DATASET"].iloc[i] == "toronto":
        result["YEAR_TORONTO"].iloc[i] = pd.to_datetime(result["START_DATE"].iloc[i]).year
        result["YEAR_TORONTO"].iloc[i].astype(int) if not np.isnan(result["YEAR_TORONTO"].iloc[i]) else np.nan
        result["YEAR_TORONTO2"].iloc[i] = result["YEAR_TORONTO"].iloc[i]

Any idea as for why this could be? Tried multiple approaches including pd.to_numeric and round() but no luck despite the method

Interestingly enough, when I output result["YEAR_TORONTO"].iloc[1].astype(int) if not np.isnan(result["YEAR_TORONTO"].iloc[i]) else np.nan, I get 2016 as an int, but once I output the entire dataframe by calling result, I still get 2016.0 as a float

Sample Data (Input):

    SOURCE_DATASET  START_DATE
0   brampton        06-04-16
1   toronto         06-04-16
2   brampton        06-04-16
3   toronto         06-04-99

Sample Data (Output):

    SOURCE_DATASET  START_DATE  YEAR_TORONTO    YEAR_TORONTO2
0   brampton        06-04-16    NaN             NaN 
1   toronto         06-04-16    2016.0          2016.0  
2   brampton        06-04-16    NaN             NaN 
3   toronto         06-04-99    1999.0          1999.0  

Just tried with np.where as well and getting the same result.

Ricardo Francois
  • 752
  • 7
  • 24
  • Can you show some sample data? And, hopefully this is not a big dataset since you are applying these operations element-wise, instead of by series. – ako Nov 27 '20 at 04:14
  • Just added some sample data. Is there a problem with me doing this element wise? The records I'll get should only have 4 rows max but what would the problem be if there was more data? – Ricardo Francois Nov 27 '20 at 04:21
  • There are NaNs, so be default all values are floats, for integers with `NaN`s need `result["YEAR_TORONTO"] = result["YEAR_TORONTO"].astype('Int64')` – jezrael Nov 27 '20 at 05:13

1 Answers1

2

Your approach to use astype() is right but it does work if you column contain nan. You could try to first split

result["YEAR_TORONTO"].astype(str).str.split('.', expand=True)[0].tolist()

And then separate then take it from there.

Alternatively

Result.loc[RESULT['TORONTO'].notnull(), 'x'] = result.loc[result['TORONTO'].notnull(), 'x'].apply(int)
  • Just tried including the first line of code and still getting a float with the same output as before. How does the second line work? I keep getting an error for the ```'x'``` – Ricardo Francois Nov 27 '20 at 04:23