0

I need to plot a histogram in python based on some data in a notepad file. My notepad file contains 10000 lines, in each line I have ten hypothesis numbers from 0 to 255:

....
....
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]

So my goal is to take the last line, then check how many times each number is repeated in all the notepad file.

For example, this is my last line [205 246 19 118 68 44 72 45 210 162]. I need to plot my histogram based on the number of repetition of each number in all the file. I need than to extract its rank:

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1]) 

I extract from this code the last line, but I can't compute the repetition of each number in all the file, how to plot the histogram based on that?

DavidG
  • 24,279
  • 14
  • 89
  • 82
tierrytestu
  • 119
  • 4
  • 12

2 Answers2

0

What you have here is an array where each element is a line from your file. If all your lines are formatted the same way (and it seems it is), you can loop over all the lines and use a counter.

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1]) 
count = 0
# The [:-1] says that you take all the values but the last one
for line in lineList[:-1]:
    if line == lineList[-1]:
        count += 1

If instead you want to check for each number in the last line, how many time they are repeated, you need to split lines. You can use the split function on your strings. Be careful, since you have brackets in each line, just remove the first and last character:

last_line = lineList[-1][1:-1].split(" ")
# This means, I want to split the last item of lineList
# with the space character " " as a separator. Also, i don't
# want the first and last character ([1:-1])

Then do the same in the loop:

# Initialize an array of counters for each element in last_line
counters = [0] * len(last_line)
for line in lineList[:-1]:
    line = line[1:-1].split(" ")
    for i in range(len(last_line)):
        if line[i] == last_line[i]:
            counters[i] += 1

Then If you want to plot an histogram, look at there : https://matplotlib.org/devdocs/gallery/pyplots/pyplot_text.html#sphx-glr-gallery-pyplots-pyplot-text-py

https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist

Adrien Logut
  • 812
  • 5
  • 13
  • I found this error: last_line = lineList[-1][1:-1].split(" ") TypeError: 'str' does not support the buffer interface – tierrytestu Oct 02 '17 at 18:49
  • you should then change fileHandle = open('path_File',"rb" ) by fileHandle = open('path_File',"r" ). You don't need to open your file in binary mode if it is in plain text. – Adrien Logut Oct 02 '17 at 19:22
0

Here is an example using the pandas library:

import StringIO #python3: io
import pandas as pd
import matplotlib.pyplot as plt

string = """[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]"""

# Here we clean the file from [] and spaces " " creating a generator
clean = (i.strip()[1:-1].split() for i in StringIO.StringIO(string)) #py3 io.String...()

# But this code here is what you want to comment out and modify 
#with open("path/to/file.txt") as f:
#    clean = (i.strip()[1:-1].split() for i in f.readlines())

# Create the dataframe
df = pd.DataFrame(clean)

# Counts all items and put them in a dict
dict_count = df.apply(pd.value_counts).sum(axis=1).to_dict()

# Dict with last row count (based on dict_count)
dict_values = {i:dict_count[i] for i in df.tail(1).values[0].tolist()}

# Plot a bar?
# https://stackoverflow.com/questions/16010869/python-plot-a-bar-using-matplotlib-using-a-dictionary
plt.bar(range(len(dict_values)), dict_values.values(), align='center')
plt.xticks(range(len(dict_values)), dict_values.keys())

plt.show()

enter image description here

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • This is exactly what I want to do but my data is extracted from a file: So it gives me this error: clean = (i.strip()[1:-1].split() for i in io.StringIO(lineList)) TypeError: initial_value must be str or None, not bytes – tierrytestu Oct 02 '17 at 18:57
  • I guess you are using py2, so I changed the code slightly. The graph gets a strange look though. But the idea is correct. – Anton vBR Oct 02 '17 at 18:58
  • I am sure that it gives me the correct solutions, the problem is when I change the string with my data from the file: fileHandle = open('Path_File.txt',"r" ) string = fileHandle.readlines() print (string), So I got this error TypeError: initial_value must be str or None, not list , when I put the string as /you have declared it, the code works correctly. – tierrytestu Oct 02 '17 at 20:47