-4

My Dataset is...

value
0.486903
0.520908
0.530904
0.483284
0.475935
0.502831
0.541743
0.566318
0.500073
0.510959
0.546008
0.551682
0.534396
0.501554
0.541277

i want to replace these values in my dataset, please provide Python code for required?

category: 0.470000-0.500000 = 1 , 0.500001-0.530000=2, 0.530001-0.56000=3

please mention how to write the modified data back to csv file ???

Ismael Padilla
  • 5,246
  • 4
  • 23
  • 35
NELSON H
  • 1
  • 3
  • I reopen question because `cut` with `+-inf` with `to_csv`. – jezrael Sep 18 '17 at 05:51
  • I try to find dupe for it, but no success. – jezrael Sep 18 '17 at 05:52
  • It's just the same thing but with labels, next time please discuss before using your badge to reopen. You can also alert the user who closed like this: @Zero – cs95 Sep 18 '17 at 06:00
  • @cᴏʟᴅsᴘᴇᴇᴅ - I get no notification of your comment. So do you think it is same? I am looking for `pd.cut` with `to_csv` by [this](https://stackoverflow.com/search?q=%5Bpandas%5D+pd.cut+df.to_csv) but no success. It is half dupe I think. What do you think? – jezrael Sep 18 '17 at 06:10
  • @jezrael the important thing is pd.cut which is answered. For saving to CSV, that's one google search/basic knowledge that anyone can find anywhere ... it doesn't have to be 100% dupe, as long as it is over 50% (this is over 90% a dupe) it's okay – cs95 Sep 18 '17 at 06:11
  • @NELSONH Not being able to help yourself says a lot about you, not me. – cs95 Sep 19 '17 at 12:01

1 Answers1

0

Use cut. Also was added 2 another groups for values bellow 0.47 and above .56, because value 0.566318 in sample data.

bins = [-np.inf, .47, 0.5, .53, .56, np.inf]
labels=[0,1,2,3,4]
df['label'] = pd.cut(df['value'], bins=bins, labels=labels)
print (df)
       value label
0   0.486903     1
1   0.520908     2
2   0.530904     3
3   0.483284     1
4   0.475935     1
5   0.502831     2
6   0.541743     3
7   0.566318     4
8   0.500073     2
9   0.510959     2
10  0.546008     3
11  0.551682     3
12  0.534396     3
13  0.501554     2
14  0.541277     3

Numpy solution:

bins = [-np.inf, .47, 0.5, .53, .56, np.inf]
df['label'] = np.array(bins).searchsorted(df['value']) - 1
print (df)
       value  label
0   0.486903      1
1   0.520908      2
2   0.530904      3
3   0.483284      1
4   0.475935      1
5   0.502831      2
6   0.541743      3
7   0.566318      4
8   0.500073      2
9   0.510959      2
10  0.546008      3
11  0.551682      3
12  0.534396      3
13  0.501554      2
14  0.541277      3

Last write to csv by to_csv:

df.to_csv('myfile', index=False)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252