0

enter image description here

enter image description here

both lines plotted with this code:

ax = sns.lineplot(x='Number of env steps total', y=y, hue="Experiment", style="Experiment", palette=palette, data=df, ax=axs[idx, 0], hue_order=hue_ordering, dashes=dash_styles, ci="sd", estimator='mean')

The only difference is in the second plot, I relabel all "Experiment" rows in the "df" dataframe from "seed[0,1,2]" to "relational_sequential (Ours)". However, the dark orange part should not be a giant clump in the 2nd picture, the mean should still be a line... why does it look like this? In the 2nd plot, we have three modes of data, but the mean should still be a clean line right... a single dark orange line and not a clumped block of orange

user3180
  • 1,369
  • 1
  • 21
  • 38
  • As an addition to Patrol75's answer, you could sample a subset of your data for sparser plots. – warped Dec 04 '19 at 07:52

2 Answers2

0

After renaming, your three different datasets are identified by the same name. Therefore, they are plotted as one dataset, the reason why you get something clumpy. Try to rename each of them differently, for example relational_sequential_1 (This Study), relational_sequential_2 (This Study), relational_sequential_3 (This Study).

Patol75
  • 4,342
  • 1
  • 17
  • 28
  • Shouldn't the mean be a thin line? my point of aggregating the three datasets is i want to represent them with a thin mean line and a light colored standard deviation area – user3180 Dec 04 '19 at 22:54
  • But then changing the name is not enough, you need to collapse the three rows in one. This [thread](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) could help you. – Patol75 Dec 04 '19 at 22:59
  • I must not understand something because I thought each row should represent one value... we can make 3 rows into 1 row by taking the mean, but then we lose the standard deviation information – user3180 Dec 04 '19 at 23:07
  • 1
    I don't think it is the way pandas work. I think it works by names. So if three rows have the same name, then they are considered as one dataset. And if they are sampled at the same x coordinates, then you end up with multiple y for a same x, and your plot looks like what it is. I don't think taking the mean is too bad, you can simply calculate the standard deviation again after. – Patol75 Dec 04 '19 at 23:14
  • So what I need to do seems to be creating multiple columns for y values... should these columns have the same name? Or how do they get associated together for the mean aggregator? – user3180 Dec 04 '19 at 23:22
  • That's where my limited use of pandas comes in to play I guess... I don't really know... If you share your data, I am happy to play with it and make it work, but out of my head, I don't exactly know what the best approach is. – Patol75 Dec 04 '19 at 23:40
0

I found out the problem: the seeds need to be sampled at the frame frequency. Upsample or downsample to same frequency.

user3180
  • 1,369
  • 1
  • 21
  • 38