- Using
p = Path(...)
: p
→ WindowsPath('so_data/files')
files = p.rglob(...)
yields all files matching the pattern
file[0]
→ WindowsPath('so_data/files/data_1.csv')
p.parent / 'plots' / f'{file.stem}.png'
→ WindowsPath('so_data/plots/data_1.png')
p.parent
→ WindowsPath('so_data')
file.stem
→ data_1
- This assumes all directories exist. Directory creation / checking is not included.
- This example uses
pandas
, as does the OP.
- Plotted with
pandas.DataFrame.plot
, which uses matplotlib
as the default backend.
- Use
.iloc
to specify the columns, and then x=0
will always be the x-axis data, based on the given example data.
- Tested in
python 3.8.11
, pandas 1.3.2
, matplotlib 3.4.3
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
p = Path('so_data/files') # specify the path to the files
files = p.rglob('data_*.csv') # generator for all files based on rglob pattern
for file in files:
df = pd.read_csv(file, header=0, sep=',') # specify header row and separator as needed
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
df.iloc[:, [0, 1]].plot(x=0, ax=ax1) # plot 1st x/y pair; assumes x data is at position 0
df.iloc[:, [2, 3]].plot(x=0, ax=ax2) # plot 2nd x/y pair; assumes x data is at position 0
fig.savefig(p.parent / 'plots' / f'{file.stem}.png')
plt.close(fig) # close each figure, otherwise they stay in memory
Sample Data
- This is for testing the plotting code
- Create a
so_data/files
directory manually.
df = pd.DataFrame({'x1': [5.0, 6.0, 7.0, 8.0, 9.0], 'y1': [60, 70, 80, 90, 100], 'x2': [5.5, 6.5, 7.5, 8.5, 9.5], 'y2': [500, 600, 700, 800, 900]})
for x in range(1, 1001):
df.to_csv(f'so_data/files/data_{x}.csv', index=False)
Alternate Answer
- This answer addresses cases where there are many consecutive pairs of x/y columns
df.column
creates an array of columns, that can be chunked into pairs
- For consecutive column pairs, this answer works
list(zip(*[iter(df.columns)]*2))
→ [('x1', 'y1'), ('x2', 'y2')]
- If necessary, use some other pattern to create pairs of columns
- Use
.loc
, since there will be column names, instead of .iloc
for column indices.
p = Path('so_data/files')
files = p.rglob('data_*.csv')
for file in files:
df = pd.read_csv(file, header=0, sep=',')
col_pair = list(zip(*[iter(df.columns)]*2)) # extract column pairs
fig, axes = plt.subplots(len(col_pair), 1) # a number of subplots based on number of col_pairs
axes = axes.ravel() # flatten the axes if necessary
for cols, ax in zip(col_pair, axes):
df.loc[:, cols].plot(x=0, ax=ax) # assumes x data is at position 0
fig.savefig(p.parent / 'plots' / f'{file.stem}.png')
plt.close(fig)