Before asking this question, I spent a day yesterday looking for an answer in previous Stack Overflow answers as well as the Internet, but I couldn't find the solution to my problem.
I have a data frame for oil production in the US over time. The data includes the date column and corresponding values. The minimum reproducible code for the data is below:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/Arsik36/SO/master/Oil%20production.csv',
parse_dates = ['date'],
index_col = 'date')
I use the below code to visualize a general trend in oil production over time:
# Visualizing Time Series
df.value.plot(title = 'Oil production over time')
# Specifying naming convention for x-axis
plt.xlabel('Date')
# Specifying naming convention for y-axis
plt.ylabel('Oil production volume')
# Improving visual aesthetics
plt.tight_layout()
# Showing the result
plt.show()
By running this code in your environment, you see that the plot shows distribution of values over time. What I struggle with is either separate plot into subplots by years (for example, 1995 - 1997), or show different lines for each year on one graph
df['1995' : '1997'].value.plot(title = 'Oil production over time', subplots = True)
When I use this code, it correctly subsets my data for only years 1997, and with subplots = True
The graph is indeed separated by year. However, by running this in your environment, you can see that graph is separated by year on the x-axis, but utilizes 1 line to show results for all 3 years. What I am trying to do is to either separate a plot into 3 subplots for years 1995, 1996, and 1997, or to show 3 lines in one plot, each line corresponding to a unique year.
It is important to me to be able to do this by keeping the date column as the index column without creating any additional columns (if possible) to solve this problem.
Thank you in advance for your help.