How to assign binary values to values in a csv column in python?

Question

I have a dataframe index and want to add a column dummy with ones and zeros depending on the value of the index. The data frame looks like:

        Date        index_value
0   0   8/1/2003    -0.33
1   1   9/1/2003    -0.37
2   2   10/1/2003   -0.42
3   3   11/1/2003    0.51
4   4   12/1/2003   -0.51
5   5   1/1/2004    -0.49
6   6   2/1/2004     0.68
7   7   3/1/2004    -0.58
8   8   4/1/2004    -0.57
9   9   5/1/2004    -0.47
10  10  6/1/2004    -0.67
11  11  7/1/2004    -0.59
12  12  8/1/2004     0.6
13  13  9/1/2004    -0.63
14  14  10/1/2004   -0.48
15  15  11/1/2004   -0.55
16  16  12/1/2004   -0.64
17  17  1/1/2005     0.68
18  18  2/1/2005    -0.81
19  19  3/1/2005    -0.68
20  20  4/1/2005    -0.48
21  21  5/1/2005    -0.48

and I want to create a dummy that gives a 1 if the index value is greater than 0.5 and 0 in other case. My code so far is:

df = pd.read_csv("index.csv", parse_dates=True)
df['dummy']=df['index_value']...

df = ....to_csv("indexdummy.csv")

But have now idea how to assign a dummy variable. My expected output for the column dummy would be: 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0

possible duplicate of: https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column — SM Abu Taher Asif, Jul 12 '19 at 07:42

score 1 · Accepted Answer · answered Jul 12 '19 at 07:35

1

Compare column name by Series.gt and cast mask to integers:

df['dummy'] = df['index_value'].gt(.5).astype(int)
#alternative
#df['dummy'] = np.where(df['index_value'].gt(.5),1,0)

#if need compare index values
#df['dummy'] = (df.index > .5).astype(int)  
print (df)
            Date  index_value  dummy
0  0    8/1/2003        -0.33      0
1  1    9/1/2003        -0.37      0
2  2   10/1/2003        -0.42      0
3  3   11/1/2003         0.51      1
4  4   12/1/2003        -0.51      0
5  5    1/1/2004        -0.49      0
6  6    2/1/2004         0.68      1
7  7    3/1/2004        -0.58      0
8  8    4/1/2004        -0.57      0
9  9    5/1/2004        -0.47      0
10 10   6/1/2004        -0.67      0
11 11   7/1/2004        -0.59      0
12 12   8/1/2004         0.60      1
13 13   9/1/2004        -0.63      0
14 14  10/1/2004        -0.48      0
15 15  11/1/2004        -0.55      0
16 16  12/1/2004        -0.64      0
17 17   1/1/2005         0.68      1
18 18   2/1/2005        -0.81      0
19 19   3/1/2005        -0.68      0
20 20   4/1/2005        -0.48      0
21 21   5/1/2005        -0.48      0

answered Jul 12 '19 at 07:35

jezrael

822,522
95
1,334
1,252

Thanks! Ans what can I do if the index value is >0.5 AND <0.5? – Dennis Jul 12 '19 at 08:28
@Dennis - Do you think `df['index_value'].between(0, .5, inlusive=False).astype(int)` - [`Series.between`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.between.html) ? – jezrael Jul 12 '19 at 08:29
2

@Dennis If you need to update your requirements then update your question, don't try to edit the answer. – Nick is tired Jul 12 '19 at 08:57
1

@Dennis - need `df['dummy'] = (~df['index_value'].between(-.5,.5)).astype(int)` – jezrael Jul 12 '19 at 08:59
@Dennis - or `df['dummy'] = np.where(df['index_value'].between(-.5,.5),0,1)` – jezrael Jul 12 '19 at 08:59

How to assign binary values to values in a csv column in python?

1 Answers1