2

I am looking for a clever way to produce a plot styled like this rather childish example: enter image description here

with source data like this:

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']


   Feature                   Values                             observed on
0        1  [5.5, 14.3, 12.0, 11.8]  [Tuesday, Wednesday, Thursday, Friday]
1        2        [6.1, 14.6, 12.7]            [Monday, Tuesday, Wednesday]
2        3             [15.2, 13.3]                       [Tuesday, Friday]
3        4       [14.9, 14.3, 17.0]              [Monday, Thursday, Friday]
4        5  [13.0, 13.1, 13.5, 10.3]     [Monday, Tuesday, Thursday, Friday]
5        6              [12.5, 7.0]                     [Wednesday, Friday]

In other words, for each line of this dataframe, I want to plot/connect the values for the "days" on which they were acquired. (Please note the days are here just to illustrate my problem, using datetime is not a solution.) But I got lost in indexing.

This is how I prepared the figure (i.e. having vertical black lines for each day)

for count, log in enumerate(days):
    plt.plot(np.ones(len(allvalues))*count,np.array(allvalues),'k',linestyle='-',linewidth=1.)
    plt.xticks(np.arange(0,5,1),['M','T','W','T','F'])

and this works, I get my vertical lines and the labels. (later I may want to plot other datasets instead of those vertical lines, but for now, the vertical lines are more illustrative) But now, how can I plot the values for each day?

for index, group in observations.iterrows():
    whichdays= group['observed on']
    values = group['Values']
    for d in whichdays:
        plt.plot(days[np.where(days==d)],values)

but this produces TypeError: list indices must be integers, not tuple

durbachit
  • 4,626
  • 10
  • 36
  • 49

1 Answers1

1

One possible solution is flatenning values from lists, pivot and then plot:

from  itertools import chain

df2 = pd.DataFrame({
        "Feature": np.repeat(df.Feature.values, df.Values.str.len()),
        "Values": list(chain.from_iterable(df.Values)),
        "observed on": list(chain.from_iterable(df['observed on']))})
print (df2)
    Feature  Values observed on
0         1     5.5     Tuesday
1         1    14.3   Wednesday
2         1    12.0    Thursday
3         1    11.8      Friday
4         2     6.1      Monday
5         2    14.6     Tuesday
6         2    12.7   Wednesday
7         3    15.2     Tuesday
8         3    13.3      Friday
9         4    14.9      Monday
10        4    14.3    Thursday
11        4    17.0      Friday
12        5    13.0      Monday
13        5    13.1     Tuesday
14        5    13.5    Thursday
15        5    10.3      Friday
16        6    12.5   Wednesday
17        6     7.0      Friday

df = df2.pivot(index='observed on', columns='Feature', values='Values')
df.index.name = None
df.columns.name = None
print (df)
              1     2     3     4     5     6
Friday     11.8   NaN  13.3  17.0  10.3   7.0
Monday      NaN   6.1   NaN  14.9  13.0   NaN
Thursday   12.0   NaN   NaN  14.3  13.5   NaN
Tuesday     5.5  14.6  15.2   NaN  13.1   NaN
Wednesday  14.3  12.7   NaN   NaN   NaN  12.5

df.plot(linestyle='-',linewidth=1.)

graph

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Is there a simple way to maintain the order of days without reindexing? – durbachit Mar 30 '17 at 07:33
  • I am doing for it, give me some time – jezrael Mar 30 '17 at 07:33
  • Hmmm, indexing is not problem `for i, d in enumerate(whichdays): print (d) print (values[i])` but I think better is not use lists - it is more complicated. – jezrael Mar 30 '17 at 07:50
  • I got stuck for a while because it works perfectly for the small example but when I use it on the big dataset, I get `ValueError: arrays must all be same length`. Now I realized why - it treats the 'observed on' as strings, not like lists! e.g. for `['Tuesday', 'Friday']` I would expect the `len` to be 2, but when I print it, it's actually 21, which is causing the error with lengths. `list(['Tuesday', 'Friday'])` doesn't do anything and `.astype(list)` or `.tolist()` throw AttributeError. – durbachit Mar 31 '17 at 00:22
  • Update on my previous comment: I tried http://stackoverflow.com/a/23112008/5553319 using the `literal_eval` function....It worked for 'observed on' and for values (values then needed to be converted to floats, but there was no problem) and the last but pressing problem that I have is with 'Feature', which is 'long' and I am not able to convert it into an integer... because it looks like `'long' object has no attribute 'astype'` and `'long' object is not iterable` – durbachit Mar 31 '17 at 01:22
  • Another update: dtypes are now fixed, after looping through the dataframe I always print the lengths of 'observed on' and 'Values' (they all correspond now), but I'm still getting the `ValueError: arrays must all be same length` - what am I missing? – durbachit Mar 31 '17 at 01:43
  • I think problem is some data between lists column are not same length e.g. in first row is not `[5.5, 14.3, 12.0, 11.8]` but `[5.5, 14.3, 12.0]`. Then if flatt values there is different length of output `array`. So then first array is lenght 17 and second 18. – jezrael Mar 31 '17 at 05:58
  • I checked it, this is not the case. The real struggle is still with the data types (take lists of strings as lists and not as strings, take list of integers as a list of integers and not a long). But once I manage to fix them, it will work, thanks! – durbachit Apr 02 '17 at 04:28
  • Super, glad can help you. If you want, you can upvote too. thanks. – jezrael Apr 02 '17 at 04:31