0

I am trying to plot multiple different lines on the same graph using pandas and matplotlib.

I have a series of 100 synthetic temperature histories that I want to plot, all in grey, against the original real temperature history I used to generate the data.

How can I plot all of these series on the same graph? I know how to do it in MATLAB but am using Python for this project, and pandas has been the easiest way I have found to read in every single column from the output file without having to specify each column individually. The number of columns of data will change from 100 up to 1000, so I need a generic solution. My code plots each of the data series individually fine, but I just need to work out how to add them both to the same figure.

Here is the code so far:

# dirPath is the path to my working directory
outputFile = "output.csv"
original_data = "temperature_data.csv"

# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)

# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)

# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)

# Plot all columns of synthetic data in grey
data = data.plot.line(title="ARMA Synthetic Temperature Histories",
                        xlabel="Time (yrs)",
                        ylabel=("Synthetic avergage hourly temperature (C)"),
                        color="#929591",
                        legend=None)

# Plot one column of original data in black
orig_data = orig_data.plot.line(color="k",legend="Original temperature data")

# Create and save figure
fig = data.get_figure()
fig = orig_data.get_figure()
fig.savefig("temp_arma.png")

This is some example data for the output data:

enter image description here

And this is the original data:

enter image description here

Plotting each individually gives these graphs - I just want them overlaid!

enter image description here

enter image description here

MUD
  • 121
  • 1
  • 13

3 Answers3

1

Your data.plot.line returns an AxesSubplot instance, you can catch it and feed it to your second command:

# plot 1
ax = data.plot.line(…)

# plot 2
data.plot.line(…, ax=ax)

Try to run this code:

# convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)

# Plot all columns of synthetic data in grey
ax = data.plot.line(title="ARMA Synthetic Temperature Histories",
                    xlabel="Time (yrs)",
                    ylabel=("Synthetic avergage hourly temperature (C)"),
                    color="#929591",
                    legend=None)

# Plot one column of original data in black


orig_data.plot.line(color="k",legend="Original temperature data", ax=ax)

# Create and save figure
ax.figure.savefig("temp_arma.png")
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Doing this gives me the error "AttributeError: 'DataFrame' object has no attribute 'get_figure'" – MUD Jul 10 '21 at 17:20
  • 1
    Why do you use the variable "data" to catch the return of `data.plot.line(...)`? [pandas.DataFrame.plot.line](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.line.html) returns a matplotlib.axes.Axes object. Can you try to run the code in my edited answer? – mozway Jul 12 '21 at 21:21
1

You should directy use matplotlib functions. It offers more control and is easy to use as well.

Part 1 - Reading files (borrowing your code)

# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)

# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)

# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)

Part 2 - Plotting

fig = plt.figure(figsize=(10,8))
ax = plt.gca()

# Plotting all histories:
# 1st column contains time hence excluding
for col in data.columns[1:]:
    ax.plot(data["Time"], data[col], color='grey')

# Orig
ax.plot(orig_data["Time"], orig_data["Temperature"], color='k')

# axis labels
ax.set_xlabel("Time (yrs)")
ax.set_ylabel("avergage hourly temperature (C)")

fig.savefig("temp_arma.png")
Piyush Singh
  • 2,736
  • 10
  • 26
1

Try the following:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
data.plot.line(ax=ax,
               title="ARMA Synthetic Temperature Histories",
               xlabel="Time (yrs)",
               ylabel="Synthetic avergage hourly temperature (C)",
               color="#929591",
               legend=False)
orig_data.rename(columns={orig_data.columns[0]: "Original temperature data"},
                 inplace=True)
orig_data.plot.line(ax=ax, color="k")

It's pretty much your original code with the following slight modifications:

Getting the ax object

fig, ax = plt.subplots()

and using it for the plotting

data.plot.line(ax=ax, ...
...
orig_data.plot.line(ax=ax, ...)

Result for some randomly generated sample data:

import random  # For sample data only

# Sample data
data = pd.DataFrame({
    f'col_{i}': [random.random() for _ in range(25)]
    for i in range(1, 50)
})
orig_data = pd.DataFrame({
    'col_0': [random.random() for _ in range(25)]
})

enter image description here

Timus
  • 10,974
  • 5
  • 14
  • 28