1

I have a pandas dataframe in the following format:

groups  value
1       0
0       0
0       0
0       0.1
1       0.4
1       0.5
0       0.5
1       0.8
0       0.8
1       0.9
1       1
1       1
1       1
1       1
0       1
0       1

I want a sorted line plot that has the value in the y-axis, as shown here:

example graph

Anyway: I also want a similar line for each group in the same plot as well. (Or JUST the two lines for the groups, but they differ in size)

Can anybody help me out? I reckon thats possible?

I use python 3.x with pandas 0.16.2. I'd prefer using matplotlib or seaborn.

Klaster
  • 673
  • 1
  • 7
  • 17
  • Where are the x-values coming from? Show the code you used to get that graph. – JoeCondron Jul 07 '15 at 17:13
  • Hey Joe, that was just an excel column to get an evenly divided x-axis (Jianxun solved this with `x = np.linspace(0, 1, len(group))`. – Klaster Jul 07 '15 at 23:55

1 Answers1

3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('/home/Jian/Downloads/real_data.csv')

# processing
# ==========================
fig, ax = plt.subplots()
ax.set_ylim([0, 1.2])
count = 0

def func(group):
    group.sort('value', inplace=True)
    x = np.linspace(0, 1, len(group))
    global ax, count
    if count > 0:
        ax.plot(x, group.value, label=group.groups.values[0])
    count += 1
    return group

df.groupby('groups').apply(func)
ax.legend(loc='best')

enter image description here

Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
  • Hey Jianxun, thanks for you answer! That works perfectly with the data I submitted. However, I have troubles with my real data which I haven't been able to resolve myself. Would you mind having a look into my edits? – Klaster Jul 07 '15 at 23:54
  • @Klaster Sorry that it's my mistake. I forgot to set `sort` to be `inplace`. I've modified the code and it should work now. :-) – Jianxun Li Jul 08 '15 at 00:05
  • Terrific! Thanks again! – Klaster Jul 08 '15 at 06:42
  • Hey, I try to understand your code and a question came up. It appears as the function is called three times: twice for the first group and once for the second group. Why? Furthermore: I'd like to access group.indicator.key for labeling ("group = 0"...). Anyway, the object in the func is a df and not a groupedby-object. How can I label the graph accordingly? – Klaster Jul 08 '15 at 07:48
  • 1
    @Klaster It's due to the current implementation of pandas `groupby`. See this Warning Message: `Warning In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.` from http://pandas.pydata.org/pandas-docs/version/0.16.1/groupby.html – Jianxun Li Jul 08 '15 at 08:18
  • 1
    @Klaster I've modified the code to deal with unexpected labelling issue. – Jianxun Li Jul 08 '15 at 08:23
  • Recent versions of pandas need `group.sort_value('groups', inplace=False)` based on this [answer](https://stackoverflow.com/a/44123892/1873897). – MikeF Mar 10 '20 at 20:50