1

I have 3 dataframes, training_data, validation_data, test_data, and I need to plot them after each other with different colors so that it looks like one line but divided in 3 color. I tried to do that by moving the x-axis start, using xlim, for the second and third time series as following code shows, but it plots all of them startong from x=0. How can I fix it?

train_data.loc[idx].plot(kind='line'
                , use_index=False
                , color='blue'
                , label='Training Data'
                , legend=False)
validation_data.loc[idx].plot(kind='line'
                , use_index=False
                , figsize=(20, 5)
                , xlim=362
                , color='red'
                , label='Validation Data'
                , legend=False)
test_data.loc[idx].plot(kind='line'
                , use_index=False
                , figsize=(20, 5)
                , xlim=481
                , color='green'
                , label='Test Data'
                , legend=False)
plt.xlim(xmin=0)
plt.legend(loc=1, prop={'size': 'xx-small'})
plt.savefig("data.pdf")
plt.clf()
plt.close()

UPDATE:

All 3 dataframes has the following shape (N, 28), there are 138 different indexes (idx) and all dataframes have part of each index. Actually, each index is a time series that was splitted to three parts as training, validation and test datasets. I need to plot only the first column, var0, of each index. That's why I'm using <df>.loc[idx].iloc[:, 0]

df= 
        idx     var0    var1    var2    var3    var4 ...  var28
        5171    10.0    2.8     0.0     5.0     1.0  ...  9.4  
        5171    40.9    2.5     3.4     4.5     1.3  ...  7.7  
        5171    60.7    3.1     5.2     6.6     3.4  ...  1.0
        ...
        5171    0.5     1.3     5.1     0.5     0.2  ...  0.4
        4567    1.5     2.0     1.0     4.5     0.1  ...  0.4  
        4567    4.4     2.0     1.3     6.4     0.1  ...  3.3  
        4567    6.3     3.0     1.5     7.6     1.6  ...  1.6
        ...
        4567    0.7     1.4     1.4     0.3     4.2  ...  1.7
       ... 
        9584    0.3     2.6     0.0     5.2     1.6  ...  9.7  
        9584    0.5     1.2     8.3     3.4     1.3  ...  1.7  
        9584    0.7     3.0     5.6     6.6     3.0  ...  1.0
        ...
        9584    0.7     1.3     0.1     0.0     2.0  ...  1.7

I tried to combine all three dataframes in one and then plot it using slicing as @Brendan Cox suggested. But I'm not getting the results I need, it still starts the plots from x=0. Here is the code:

data = pd.concat([train_data.loc[idx].iloc[:, 0], validation_data.loc[idx].iloc[:, 0], test_data.loc[idx].iloc[:, 0]])
data.iloc[0:362].plot(kind='line'
                          , use_index=False
                          , figsize=(20,5)
                          , color='blue'
                          , label='Training Data'
                          , legend=False)
data.iloc[362:481].plot(kind='line'
                        , use_index=False
                        , figsize=(20, 5)
                        , color='red'
                        , label='Validation Data'
                        , legend=False)
data.iloc[481:].plot(kind='line'
                     , use_index=False
                     , figsize=(20, 5)
                     , color='green'
                     , label='Test Data'
                     , legend=False)

I attached the resulted plot (which is wrong!). I need to have the red and green lines to continue after the blue line enter image description here

Birish
  • 5,514
  • 5
  • 32
  • 51

2 Answers2

2

If I'm understanding correctly, you should be able to simply subset (i.e., slice) your input data along the x-axis and plot each portion of the line -- e.g.:

df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/fpp2/goog200.csv", index_col=0)
df['value'].plot()

df.loc[0:25,'value'].plot()
df.loc[25:150, 'value'].plot()
df.loc[150:, 'value'].plot()
plt.show()

enter image description here


Edit per comments below: use of iloc[] and use_index=False seems to replicate the 'starting each plot at 0' behavior. Note that your ilocs do not select a column. Thus, you may need to revise both your iloc and as_index=False.

df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/fpp2/goog200.csv", index_col=0)

df.iloc[0:25,1].plot(use_index=False)
df.iloc[25:150, 1].plot(use_index=False)
df.iloc[150:, 1].plot(use_index=False)
plt.show()

enter image description here

Brendan
  • 3,901
  • 15
  • 23
  • The plot you made is exactly what I need, but I have three separate dataframes. Do I have to combine them to one dataframe and then plot by slicing? Is that the only way? – Birish Jul 15 '19 at 20:45
  • @Birish I believe it should not matter as long as the x and y axes are shared. E.g., you can create `df2 = df.copy()` and `df3 = df.copy()`, then plot the second and third lines from `df2` and `df3` in my example, and it does the exact same thing. – Brendan Jul 15 '19 at 20:48
  • Just tried your suggested solution, still it plots them all starting from x=0 :/ – Birish Jul 15 '19 at 21:29
  • @Birish It's difficult to say more about exactly what's going on without sample data. I can confirm that my comment above holds true -- plotting each line segment from a separate dataframe has no impact, and it generates the same result. It seems there's something with your data that starts the plot at zero, rather than the method proposed above. If you post a sample of your data, or at least a [mcve](https://stackoverflow.com/help/minimal-reproducible-example) reproducing the behavior, that would be helpful, as with my (admittedly quick) testing, it works fine and as I would expect. – Brendan Jul 15 '19 at 23:56
  • But to be clear -- I'm not suggesting you actually index with `.iloc`. My solution is a demonstration of concept. It would probably be better to index based on a boolean condition related to your x-axis, like `df['year'] <= 1980`, `df['year'].between(1980,1990)`, etc. But again, hard to say more without example data. – Brendan Jul 15 '19 at 23:58
  • @ Brendan Cox thanks for your time. I updated my question, hope it is more clear now. – Birish Jul 16 '19 at 08:28
  • @Birish Try removing `use_index=False` – Brendan Jul 16 '19 at 13:14
  • That will plot only one value, which is my index. So if I remove it, the plot will be one straight line that with value `5171` :( – Birish Jul 16 '19 at 16:46
  • @Birish See edits above for a demonstration of how `as_index=False` leads to the behavior of each plot starting at 0. It may be related to this, and/or your method of slicing, but I think the answer above demonstrates the concept clearly enough. From there, it's debugging. I would start by going back to your basic dataframe, trying to plot each line across the entire range on one graph, and then subsetting, adding each step piece-by-piece so you can see where the error comes in. – Brendan Jul 16 '19 at 17:08
0

Getting help from this answer, I could fix the issue as follow:

limit_1 = train_data.loc[idxs[0]].iloc[:, 0].shape[0]  # 362
limit_2 = train_data.loc[idxs[0]].iloc[:, 0].shape[0] + validation_data.loc[idxs[0]].iloc[:, 0].shape[0]  # 481
for idx in idxs:
   train_data.loc[idx].iloc[:, 0].reset_index(drop=True).plot(kind='line'
                                                              , use_index=False
                                                              , figsize=(20, 5)
                                                              , color='blue'
                                                              , label='Training Data'
                                                              , legend=False)
   validation = validation_data.loc[idx].iloc[:, 0].reset_index(drop=True)
   validation.index = pd.RangeIndex(len(validation.index))
   validation.index = range(limit_1, limit_1+len(validation.index))
   validation.plot(kind='line'
                   , figsize=(20, 5)
                   , color='red'
                   , label='Validation Data'
                   , legend=False)
   test = test_data.loc[idx].iloc[:, 0].reset_index(drop=True)
   test.index = pd.RangeIndex(len(test.index))
   test.index = range(limit_2, limit_2+len(test.index))
   test.plot(kind='line'
            , figsize=(20, 5)
            , color='green'
            , label='Test Data'
            , legend=False)
   plt.legend(loc=1, prop={'size': 'xx-small'})
   plt.title(str(idx))
   plt.savefig(str(idx) + ".pdf")
   plt.clf()
   plt.close()
Birish
  • 5,514
  • 5
  • 32
  • 51