How to plot a histogram in python from data in notepad file?

Question

I need to plot a histogram in python based on some data in a notepad file. My notepad file contains 10000 lines, in each line I have ten hypothesis numbers from 0 to 255:

....
....
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]

So my goal is to take the last line, then check how many times each number is repeated in all the notepad file.

For example, this is my last line [205 246 19 118 68 44 72 45 210 162]. I need to plot my histogram based on the number of repetition of each number in all the file. I need than to extract its rank:

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1])

I extract from this code the last line, but I can't compute the repetition of each number in all the file, how to plot the histogram based on that?

Does the ordering matter? (example last line and first line differ only by the ordering of integers) — Preston Martin, Oct 02 '17 at 13:48
@cᴏʟᴅsᴘᴇᴇᴅ If you are talking about this [] , so yes. thank you in advance — tierrytestu, Oct 02 '17 at 13:49
@PrestonM In fact they are random numbers, so order is not important. — tierrytestu, Oct 02 '17 at 13:50
@tierrytestu so in your example, you would want a histogram of the number of occurrences of 205, 246, 19, etc. in the file. Is that correct? — Preston Martin, Oct 02 '17 at 14:06

Adrien Logut · Answer 1 · 2017-10-02T18:11:45.693

What you have here is an array where each element is a line from your file. If all your lines are formatted the same way (and it seems it is), you can loop over all the lines and use a counter.

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1]) 
count = 0
# The [:-1] says that you take all the values but the last one
for line in lineList[:-1]:
    if line == lineList[-1]:
        count += 1

If instead you want to check for each number in the last line, how many time they are repeated, you need to split lines. You can use the split function on your strings. Be careful, since you have brackets in each line, just remove the first and last character:

last_line = lineList[-1][1:-1].split(" ")
# This means, I want to split the last item of lineList
# with the space character " " as a separator. Also, i don't
# want the first and last character ([1:-1])

Then do the same in the loop:

# Initialize an array of counters for each element in last_line
counters = [0] * len(last_line)
for line in lineList[:-1]:
    line = line[1:-1].split(" ")
    for i in range(len(last_line)):
        if line[i] == last_line[i]:
            counters[i] += 1

Then If you want to plot an histogram, look at there : https://matplotlib.org/devdocs/gallery/pyplots/pyplot_text.html#sphx-glr-gallery-pyplots-pyplot-text-py

https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist

I found this error: last_line = lineList[-1][1:-1].split(" ") TypeError: 'str' does not support the buffer interface — tierrytestu, Oct 02 '17 at 18:49
you should then change fileHandle = open('path_File',"rb" ) by fileHandle = open('path_File',"r" ). You don't need to open your file in binary mode if it is in plain text. — Adrien Logut, Oct 02 '17 at 19:22

Anton vBR · Accepted Answer · 2017-10-02T19:03:00.687

Here is an example using the pandas library:

import StringIO #python3: io
import pandas as pd
import matplotlib.pyplot as plt

string = """[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]"""

# Here we clean the file from [] and spaces " " creating a generator
clean = (i.strip()[1:-1].split() for i in StringIO.StringIO(string)) #py3 io.String...()

# But this code here is what you want to comment out and modify 
#with open("path/to/file.txt") as f:
#    clean = (i.strip()[1:-1].split() for i in f.readlines())

# Create the dataframe
df = pd.DataFrame(clean)

# Counts all items and put them in a dict
dict_count = df.apply(pd.value_counts).sum(axis=1).to_dict()

# Dict with last row count (based on dict_count)
dict_values = {i:dict_count[i] for i in df.tail(1).values[0].tolist()}

# Plot a bar?
# https://stackoverflow.com/questions/16010869/python-plot-a-bar-using-matplotlib-using-a-dictionary
plt.bar(range(len(dict_values)), dict_values.values(), align='center')
plt.xticks(range(len(dict_values)), dict_values.keys())

plt.show()

This is exactly what I want to do but my data is extracted from a file: So it gives me this error: clean = (i.strip()[1:-1].split() for i in io.StringIO(lineList)) TypeError: initial_value must be str or None, not bytes — tierrytestu, Oct 02 '17 at 18:57
I guess you are using py2, so I changed the code slightly. The graph gets a strange look though. But the idea is correct. — Anton vBR, Oct 02 '17 at 18:58
I am sure that it gives me the correct solutions, the problem is when I change the string with my data from the file: fileHandle = open('Path_File.txt',"r" ) string = fileHandle.readlines() print (string), So I got this error TypeError: initial_value must be str or None, not list , when I put the string as /you have declared it, the code works correctly. — tierrytestu, Oct 02 '17 at 20:47

How to plot a histogram in python from data in notepad file?

2 Answers2