I have the following pandas DataFrame:
import pandas as pd
df = pd.read_table(...)
df
>>> df
>>> interval location type y_axis
0 01 1230 X 50
1 01 1609 X 55
2 01 1903 Y 54
3 01 2574 A 58
4 01 3151 A 57
5 01 3198 B 46
6 01 3312 X 50
... .....
02 42 X 31
02 214 A 23
02 598 X 28
....
There are several intervals, e.g. 01
, 02
, etc. Within each interval, data points lie within the range of 1 to 10,000. In df
, the first datapoint is at 40, the next at 136, etc.
Interval 02
also has a range from 1 to 15,000.
I would like to create a scatterplot, such that the range of 1 to 15000 is proportionally plotted for each interval. Then the first point would be plotted at 1230, the next plotted at 1609, etc. I would also like a vertical line which shows where the intervals are. The scatterplot's x-axis should be spaced from 1 to 10,000. Each interval is a "region", containing this x-axis from 1 to 10,000. So the coordinates on the x-axis are interval1: 1 to 15000, interval2: 1 to 15000, interval 3: 1 to 15000, etc. (It is almost like several individual scatterplots concatenated together.)
How does one accomplish this? Without this complication of intervals, if one wished to create a scatterplot from this DataFrame, one would use:
df.plot(kind='scatter', x = "location", y = "y_axis")
Here are the first 50 rows:
d = {"interval" : ["01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01"], "location" : [1230, 1609,
1903, 2574, 3151, 3198, 3312, 3659, 3709,
3725, 4172, 4542, 4860, 4900, 5068, 5220,
5260, 5339, 5442, 5529, 5773, 6128, 6165,
6177, 6269, 6275, 6460, 7167, 7361, 7361,
8051, 8222, 8305, 8992, 9104, 9439, 9844,
10045, 10764, 10787, 11104, 11478, 11508,
11684, 12490, 12590, 12794, 12803, 13823,
13982], "type" : ["X", "X", "Y", "A", "A",
"B", "X", "X", "X", "B", "B", "A", "A", "A", "B", "B", "X",
"B", "Y", "X", "X", "Y", "Y", "C", "A", "X", "X", "Z", "Z",
"B", "X", "X", "A", "A", "Y", "X", "A", "X", "X", "Z", "Z",
"C", "X", "Y", "Y", "Z", "Z", "Z", "Z", "Z"], "y_axis" : [50, 55,
54, 58, 57, 46, 50, 55, 46, 42, 56, 55, 55, 45, 52, 51, 45, 48, 50,
49, 53, 55, 45, 40, 49, 37, 52, 58, 52, 4, 58, 52, 49, 58, 50, 55,
56, 53, 58, 43, 55, 55, 44, 52, 59, 49, 53, 39, 60, 52]}