Since you are constructing your data points after reading the data from the input file, if you don't want the data with a 0 value on the Y-axis, why don't you just don't add them to your data points.
Without doing too much modifications to your code it could look like:
import matplotlib.pyplot as plt
#Read data from file
#data = open("EDS4008_2023_04_12.txt",'r')
input_file = open("input_data.txt")
input_lines = input_file.readlines()
#Extract time and ping values from the data
time = []
ping_time = []
index = 0
for current_line in input_lines:
#print(string)
index += 1
if (" time" in current_line) and (" bytes" in current_line):
var1 = current_line.split(' ')[8]
#print(var1)
if "=" in var1:
t = var1.split("=")[1].split("m")[0]
ping_time.append(int(t))
time.append(index / 60)
elif "<" in var1:
t = var1.split("<")[1].split("m")[0]
ping_time.append(int(t))
time.append(index / 60)
else:
print("error")
# Don't add this point
# ping_time.append(0)
# time.append(index / 60)
else:
print("skip this line")
# Don't add this point
# ping_time.append(0)
# time.append(index / 60)
print(ping_time)
print(time)
max_value = max(ping_time)
min_value = min(ping_time)
plt.scatter(time, ping_time)
plt.xlabel('Time (minutes)')
plt.ylabel('Ping Time (ms)')
plt.title('Pinging Duration')
plt.axhline(max_value)
plt.axhline(min_value)
plt.show()
- To draw the lines for the max and min of the readings, we use these functions:
I would also stress that you should not create variables with generic names because some of these names are already used in the standard library. For example these names already exists:
- list() is a fondamental class in Python
- time is a module of the standard library
- string is a module of the standard library
But we can do better.
Whenever you have to extract data from text, regular expressions are one tool that is often useful for this.
import re
import matplotlib.pyplot as plt
matcher = re.compile(r'bytes=\d+ time[=<](?P<duration>\d+)ms')
with open("input_data.txt") as input_file:
ping_time = []
timestamps = []
for (line_number, current_line) in enumerate(input_file):
if line_matched := matcher.search(current_line):
timestamps.append(line_number / 60)
ping_time.append(int(line_matched.group('duration')))
else:
print('Skip this line')
print(ping_time)
print(timestamps)
max_value = max(ping_time)
min_value = min(ping_time)
plt.scatter(timestamps, ping_time)
plt.xlabel('Time (minutes)')
plt.ylabel('Ping Time (ms)')
plt.title('Pinging Duration')
plt.axhline(max_value)
plt.axhline(min_value)
plt.show()
To understand this code you will need to read on:
- Regular expressions using the re module
- enumerate()
- Using context managers to make sure a file is closed when you don't need it anymore as described in the second example of this section of the official tutorial
An advanced way of doing it is using generators expressions and zip() in addition to regular expressions to create code that will be a little bit more compact while not keeping all the content of the file in memory.
import re
import matplotlib.pyplot as plt
matcher = re.compile(r'bytes=\d+ time[=<](?P<duration>\d+)ms')
with open("input_data.txt") as input_file:
lines_matched = (matcher.search(current_line) for current_line in input_file)
data_points = (
(x / 60, int(current_match.group('duration')))
for (x, current_match) in enumerate(lines_matched)
if current_match
)
timestamps, ping_time = zip(*data_points)
print(ping_time)
print(timestamps)
max_value = max(ping_time)
min_value = min(ping_time)
plt.scatter(timestamps, ping_time)
plt.xlabel('Time (minutes)')
plt.ylabel('Ping Time (ms)')
plt.title('Pinging Duration')
plt.axhline(max_value)
plt.axhline(min_value)
plt.show()
Is is what you were looking for?