2

I have a DataFrame (see 'Test Data' section below) and I would like to add a secondary x axis (at the top). But this axis has to be from 0 to 38.24(ms). This is the sum of all values in column 'Time'. It expresses the total time that the 4 inferences took to execute. So far I have tried 'twinx()' without success.

How can I do that? Is it possible or am I lacking information?

Test Data:

raw_data = {'Time': [21.9235, 4.17876, 4.02168, 3.81504, 4.2972],
            'TPU': [33.3, 33.3, 33.3, 33.3, 33.3],
            'CPU': [32, 32, 32, 32, 32],
            'MemUsed': [435.92, 435.90, 436.02, 436.02, 436.19]}

df_m=pd.DataFrame(raw_data, columns = ['Time', 'TPU', 'CPU', 'MemUsed'])

df_m
##Sum of all values in column Time(ms)
(df_m.iloc[:, 0].sum())

##Time per inference(ms)
ax = df_m.plot(kind = 'line', y = 'MemUsed', grid = True)
ax.set_xlabel("NUMBER OF INFERENCES")
ax.set_ylabel("MemUsed(MB)")

What I have tried:

ax = df_m.plot(kind = 'line', y = 'MemUsed', grid = True)
df_m.plot(kind='line', ax=ax.twinx(), secondary_x=range(0, 39))
ax.set_xlabel("NUMBER OF INFERENCES")
ax.set_ylabel("MemUsed(MB)")

Output Graph:

enter image description here

How does the big table look like

enter image description here

Aizzaac
  • 3,146
  • 8
  • 29
  • 61

2 Answers2

1

Further to your positive comment regarding plotly, here is an example of how to achieve a multi-xaxis for your dataset.

The code is a lot simpler than it looks. The code appears 'lengthy' due to the way I've formatted the dicts for easier reading.

The key elements are:

  • Adding a cumulative sum of the time column (time_c) for use on xaxis2.
  • Adding a hidden trace which aligns to xaxis, and your time data which aligns to xaxis2. Without the hidden trace, either both axes do not appear, or they appear but are not aligned, due to only one trace being plotted.

(Updated) Sample Code:

The following code has been updated to address the issue OP was having with a larger (70k row) dataset.

The key change is an update to the layout['xaxis'] and layout['xaxis2'] dicts to contain 'type': 'category', 'nticks' and defined 'range' keys.

import pandas as pd
from plotly.offline import plot

# Create the dataset.
raw_data = {'time': [21.9235, 4.17876, 4.02168, 3.81504, 4.2972],
            'tpu': [33.3, 33.3, 33.3, 33.3, 33.3],
            'cpu': [32, 32, 32, 32, 32],
            'memused': [435.92, 435.90, 436.02, 436.02, 436.19]}

df = pd.DataFrame(raw_data)
df['time_c'] = df['time'].cumsum().round(2)

# Plotting code.
data = []
layout = {'margin': {'t': 105},
          'title': {'text': 'Example Showing use of Secondary X-Axis', 
                    'y': 0.97}}

# Create a (hidden) trace for the xaxis.
data.append({'x': df.index,
             'y': df['memused'],
             'showlegend': False,
             'mode': 'markers', 
             'marker': {'size': 0.001}})
# Create the visible trace for xaxis2.
data.append({'x': df['time_c'],
             'y': df['memused'],
             'xaxis': 'x2',
             'name': 'Inference'})

# Configure graph layout.
nticks = int(df.shape[0] // (df.shape[0] * 0.05))
layout['xaxis'] = {'title': 'Number of Inferences',
                   'nticks': nticks,
                   'range': [df.index.min(), df.index.max()],
                   'tickangle': 45,
                   'type': 'category'}
layout['xaxis2'] = {'title': 'Time(ms)', 
                    'nticks': nticks,
                    'overlaying': 'x1', 
                    'range': [df['time_c'].min(), df['time_c'].max()],
                    'side': 'top', 
                    'tickangle': 45,
                    'type': 'category'}
layout['yaxis'] = {'title': 'Memory Used (MB)'}

fig = {'data': data, 'layout': layout}
plot(fig, filename='/path/to/graph.html')

Example Graph (original dataset):

I've intentionally left out any additional appear configuration for code simplicity. However, referring to the top level plotly docs, the graphs are highly configurable.

enter image description here

Example Graph (new dataset):

This graph uses the (larger, 70k row) synthesised dataset from the other answer.

enter image description here

S3DEV
  • 8,768
  • 3
  • 31
  • 42
  • I have problems when using a big dataframe – Aizzaac Sep 15 '20 at 17:00
  • That might be because the data is actually getting plotted twice; I think I have a solution. Is the dataset similar to what you have posted in the question? How many rows? I have an idea for a different xaxis approach. – S3DEV Sep 15 '20 at 18:32
  • 1
    Yes, it is similar. I will put an image. There are 70000 rows – Aizzaac Sep 15 '20 at 18:37
  • 1
    There we go. I've updated the code and graphs in the original answer to address the issue you're having. Additionally, I've posted a second answer (sshhhh) to give you another option, which removes the need to plot *two* traces of 70k data point each. Hope this helps! – S3DEV Sep 15 '20 at 21:39
  • It has worked. It is faster. I will analyze your code. – Aizzaac Sep 15 '20 at 22:19
1

Although generally discouraged, I'll post another answer to address the new dataset, as the previous answer works, given the original dataset.

This example diverges from the original request of a secondary x-axis for two reasons:

  1. Due to the size of the (new) dataset, plotting a 'hidden' layer of data is not optimal.
  2. For a secondary x-axis to display properly, a second trend must be plotted, and given the previous reason, this is no longer an option.

Therefore, a different approach has been taken - that of combined labeling of the x-axis. Rather than plotting two axes, the single x-axis features both required labels.

Example Graph:

Note: This is (obviously) synthesised data, in order to achieve the number of rows (70k) in the updated question.

enter image description here

Sample Code:

import numpy as np
import pandas as pd
from plotly.offline import plot

# Synthesised dataset. (This code can be ignored.)
np.random.seed(0)
a = np.random.exponential(size=70000)*4
t = pd.Series(a).rolling(window=2000, min_periods=50).mean().to_numpy()
r = np.arange(70000).astype(str)
m = t*100

df = pd.DataFrame({'run': r, 
                   'time': t,
                   'memused': m}).dropna()

# Add cumulative time column.
df['time_c'] = df['time'].cumsum().round(1)


# --- Graphing code starts here ---

def create_labels(x):
    """Function to create xaxis labels."""
    return f"({x['run']}): {x['time_c']}"

# Create xaxis labels.
df['xaxis'] = df.apply(create_labels, axis=1)

# Create the graph.
data = []
layout = {'title': 'Combined X-Axis Labeling'}
data.append({'x': df['xaxis'], 
             'y': df['memused']})

layout['xaxis'] = {'title': '(Inference): Cumulative Time (ms)', 
                   'type': 'category', 
                   'nticks': df.shape[0] // 3500,
                   'tickangle': 45}
layout['yaxis'] = {'title': 'Memory Used (MB)'}


fig = {'data': data, 'layout': layout}
plot(fig, filename='/path/to/graph.html')
S3DEV
  • 8,768
  • 3
  • 31
  • 42