0

I have taken a specific list from a larger dataset and would like to assign the value of 0 to negative numbers and 1 to numbers >= 0.

This code gave me the specific list from the larger dataset

r = data[['return']]
r.head()

This is the code I have already tried to accomplish what I wanted

for num in r:
    if num >= 0:
        num = 1
    else:
        num = 0

It did not work and instead said that "'>=' not supported between instances of 'str' and 'int'"

Nn Nn
  • 35
  • 1
  • 3
  • Your data seems to be strings. Please provide it. [mcve] – Patrick Artner Mar 23 '19 at 23:07
  • I would recommend you to use numpy's `where`. – Sheldore Mar 23 '19 at 23:09
  • Possible duplicate of [TypeError: '<=' not supported between instances of 'str' and 'int'](https://stackoverflow.com/questions/41950021/typeerror-not-supported-between-instances-of-str-and-int) – Peter Wood Mar 23 '19 at 23:10
  • Provided you convert your values to float or int type, using `r=np.array(list(map(float, r)))`, you can then just do `r = np.where(r >= 0, 1, 0)` – Sheldore Mar 23 '19 at 23:11

4 Answers4

1

It is possible that the data type that you store in 'data' is a string. To confirm this you can do

print(type(num))

if it prints 'str' then you are storing your data in 'data' variable as a string.

There are 2 fixes for this: 1. You might just want to store integers in data. 2. If you can't do anything about the way you get data, then you can cast your data to an integer and then do the check.

Assuming you are new programmer, for future references, these kind of errors are called type-errors or casting errors. Meaning the data type of your operands are not compatible with the operator. In this case '>=' expects that the data type of both its operands 'num' and 0 be of the same data-type.

On a side note, it looks like you are trying to update the members of your list. But the way you are looping through the list right now, you won't be able to update the elements. If you ended up printing the list at the end of your for loop you would notice that r hasn't changed at all. Here is a good stackoverflow question for reference How to modify list entries during for loop?

To fix this, follow the example below.

for idx, num in enumerate(r):
    if int(num) >= 0:
        r[idx] = '1' # Note that you will be storing a string again
    else:
        r[idx] = '0'

Hope that works out! Cheers!

0

You need to convert the string to an integer like int('2')

r = ['0','1','-1']
for num in r:
    number = int(num)
    if number >= 0:
        number = 1
    else:
        number = 0
    print(number)

Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
0
r = pd.Series(['1', '2', '-1']) 
r = r.astype(float)

r[r>=0] = 1
r[r<0] = 0
# OR r = np.where(r>=0, 1, 0)
  • Convert to float
  • Index all values >= 0 and set them to 1
  • Index all values < 0 and set them to 0
mujjiga
  • 16,186
  • 2
  • 33
  • 51
0

The return column might contain numeric values, like below:

data_dict = {'return': [-1, 0, 2], 'col2': [10, 11, 12]}
data = pd.DataFrame(data)

r = data[['return']]
r.head()

for num in r:
    if num >= 0:
        num = 1
    else:
        num = 0

This gives the TypeError: '>=' not supported between instances of 'str' and 'int', I think this is because the for loop iterates through the column axis (which are strings).

I think a nice solution is to use broadcasting instead of a for loop. But it gives warnings when changing the same column:

r.loc[r['return'] >= 0,'return'] = 1
r.loc[r['return'] < 0,'return'] = 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

so you might create a new column:

r.loc[r['return'] >= 0, 'return2'] = 1
r.loc[r['return'] < 0, 'return2'] = 0
r['return2'] = r['return2'].astype('int')
Bas
  • 454
  • 1
  • 6
  • 14