0

i have a csv file containing data, i have a column that contains positive and negative values and i need to plot the mean of this column in a way to have 2 bars , one for the negative values and one for the positive values. Take a look on my data :

timestamp,heure,lat,lon,ampl,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

i am using this code to plot my data :

names =["timestamp","heure","lat","lon","ampl","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)
data['ampl'] = data['ampl'].abs()
yearly = data.groupby(data.index.month)['ampl'].count()
ax = yearly.plot(kind='bar')

so, i need to disassociate the values of the column in question and have 2 bars instead of one , how can I proceed ?

Mar
  • 419
  • 1
  • 7
  • 19
  • Without data it is a bit problematic, but if change `yearly = data.groupby(data.index.month)['ampl'].count()` to `yearly = data.groupby([data.index.month, 'type'])['ampl'].count().unstack(fill_value)` it should work. – jezrael Jun 22 '17 at 16:23
  • If it does not works, can you add 4-5 rows of sample data? – jezrael Jun 22 '17 at 16:24
  • I just edited my question, you can take a look on my data now – Mar Jun 22 '17 at 16:26

1 Answers1

1

First create new column sign by numpy.sign and map by dict.

Then add new column name to groupby, aggregate by size and reshape by unstack:

data['sign'] = np.sign(data['ampl']).map({1:'+', -1:'-', 0:'0'})
data['ampl'] = data['ampl'].abs()
yearly = data.groupby([data.index.month, 'sign'])['ampl'].size().unstack()
yearly.plot(kind='bar')

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I did as you said, and I got this : KeyError: 'type' – Mar Jun 22 '17 at 16:34
  • What is `print (df.columns.tolist())` ? – jezrael Jun 22 '17 at 16:35
  • it gives this : ['heure', 'lat', 'lon', 'ampl'] – Mar Jun 22 '17 at 16:36
  • Hmmm, it is interesting, because in your sample is last column `type`. What if remove parameter `names`? Change `data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)` to `data = pd.read_csv('flash.txt', parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)`. Names is used, if no header in csv. How does it works now? – jezrael Jun 22 '17 at 16:40
  • it's working now, but i just can't understand the meaning of the result, i'll send you the result via email since i don't have the right to share images ? – Mar Jun 22 '17 at 16:53
  • Sure, no problem. – jezrael Jun 22 '17 at 16:55