1

I am using matplotlib to graph my results from a .dat file.

The data is as follows

1145, 2021-07-17 00:00:00, bob, rome, 12.75, 65.0, 162.75
1146, 2021-07-12 00:00:00, billy larkin, italy, 93.75, 325.0, 1043.75
114, 2021-07-28 00:00:00, beatrice, rome, 1, 10, 100
29, 2021-07-25 00:00:00, Colin, italy the third, 10, 10, 50
5, 2021-07-22 00:00:00, Veronica, canada, 10, 100, 1000
1149, 1234-12-13 00:00:00, Billy Larkin, 1123, 12.75, 65.0, 162.75

I want to print a years worth of data (Jan to Dec) in the proper sequence and have my labels show up as the months, instead of the long date.

Here is my code:

import matplotlib.pyplot as plt
import csv

x = []
y = []

with open('Claims.dat','r') as csvfile:
    #bar = csv.reader(csvfile, delimiter=',')
    plot = csv.reader(csvfile, delimiter=',')

    for row in plot:
        x.append(str(row[1]))
        y.append(str(row[6]))

plt.plot(x,y, label='Travel Claim Totals!', color='red', marker="o")
plt.xlabel('Months', color="red", size='large')

plt.ylabel('Totals', color="red", size='large')
plt.title('Claims Data:   Team Bobby\n Second Place is the First Looser', color='Blue', weight='bold', size='large')

plt.xticks(rotation=45, horizontalalignment='right', size='small')
plt.yticks(weight='bold', size='small', rotation=45)

plt.legend()
plt.subplots_adjust(left=0.2, bottom=0.40, right=0.94, top=0.90, wspace=0.2, hspace=0)
plt.show()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

2 Answers2

1

I think the easiest way is to resort the data based on the date, which can be constructed using the datetime package. Here is a min working example, based on your data

import datetime

def isfloat(value: str):
  try:
    float(value)
    return True
  except ValueError:
    return False

def isdatetime(value: str):
  try:
    datetime.datetime.fromisoformat(value)
    return True
  except ValueError:
    return False

data = r"""1145, 2021-07-17 00:00:00, bob, rome, 12.75, 65.0, 162.75
1146, 2021-07-12 00:00:00, billy larkin, italy, 93.75, 325.0, 1043.75
114, 2021-07-28 00:00:00, beatrice, rome, 1, 10, 100
29, 2021-07-25 00:00:00, Colin, italy the third, 10, 10, 50
5, 2021-07-22 00:00:00, Veronica, canada, 10, 100, 1000
1149, 1234-12-13 00:00:00, Billy Larkin, 1123, 12.75, 65.0, 162.75"""

for idx in range(len(data)):
  data[idx] = data[idx].split(', ')
  for jdx in range(len(data[idx])):
    if data[idx][jdx].isnumeric():    # Is it an integer?
      value = int(data[idx][jdx])
    elif isfloat(data[idx][jdx]):     # Is it a float?
      value = float(data[idx][jdx])
    elif isdatetime(data[idx][jdx]):  # Is it a date?
      value = datetime.datetime.fromisoformat(data[idx][jdx])
    else:
      value = data[idx][jdx]
    data[idx][jdx] = value

data.sort(key=lambda x: x[1])

You can also sort by more specific things:

data.sort(key=lambda x: x[1].month)

Note: You might not need all the logic in the for-loop. I think the csv package does some basic preprocessing for you, such as splitting and data type conversion.

RafazZ
  • 4,049
  • 2
  • 20
  • 39
0

Imports and DataFrame

import pandas as pd
import matplotlib.dates as mdates  # used to format the x-axis
import matplotlib.pyplot as plt

# read in the data
df = pd.read_csv('Claims.dat', header=None)

# convert the column to a datetime format, which ensures the data points will be plotted in chronological order
df[1] = pd.to_datetime(df[1], errors='coerce').dt.date

# display(df)
      0           1              2                 3      4      5        6
0  1145  2021-07-17            bob              rome  12.75   65.0   162.75
1  1146  2021-07-12   billy larkin             italy  93.75  325.0  1043.75
2   114  2021-07-28       beatrice              rome   1.00   10.0   100.00
3    29  2021-07-25          Colin   italy the third  10.00   10.0    50.00
4     5  2021-07-22       Veronica            canada  10.00  100.0  1000.00
5  1149  2020-12-13   Billy Larkin              1123  12.75   65.0   162.75

Plotting the DataFrame

# plot the dataframe, which uses matplotlib as the backend
ax = df.plot(x=1, y=6, marker='.', color='r', figsize=(10, 7), label='Totals')

# format title and labels
ax.set_xlabel('Months', color="red", size='large')
ax.set_ylabel('Totals', color="red", size='large')
ax.set_title('Claims Data:   Team Bobby\n Second Place is the First Looser', color='Blue', weight='bold', size='large')

# format ticks
xt = plt.xticks(rotation=45, horizontalalignment='right', size='small')
yt = plt.yticks(weight='bold', size='small', rotation=45)

# format the dates on the xaxis
myFmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_formatter(myFmt)

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158