1

I have a piece of code that initially decodes a .dat file into a .txt file using a binary chipher cycle style decoder. It results in an over 500 line text file of data points with lines 0-65 being titles and other display features and the last few lines, starting from 586, being wrongly decoded text that looks something like:

ßÅBÎheÀœaÜ;sî3TÐêM·Zì?pêI†Q’&×¥ü#ÇPËiPì¿j–hñHžíoî#ˆ[ÿ>BÿÃ@ÌhcP¿_ÔkõOˆEñlÀ‹J–>tò5Ægã_ð: yŽ6aÎ “uôhaù*°Dý4}Ó´Qá4wÙ žZôØ ‘~êlHí–’/mÑ=žt k×£QÉoû·]Ý&õC´Jœ9mû»ZÃ+]þ6ƒ[ቶS;Uö¥Wã Lè:ÂXÿ4sÈÄAïPó€Dó$EØÙ•dДeïkHâN xÐj@Ø"”eë1aõÅCÒ7ùC–ñiÐCÑP‹Æ Ñ ]ô†}ÌdDñ  Ë,WÎÄdó^ã8žDäÓ)Çq9}ùÃfÄP÷ÇzîoiÒ ÁpìeSÖ€ÒMŒÀ“;Bö

I am using the code:

with open (file) as f:  
    xpoints, ypoints, gradient = np.loadtxt(itertools.islice(f,68, 584), delimiter=',', unpack=True)

in order to load only the lines that contain the data points I am after.

For some reason however, this causes the program to throw an error that it cant decode a byte as it maps to undefined. I have confirmed it is caused by the junk text at the bottom and seems to be thrown in the line shown above but I cannot figure out why this is the case as it shouldn't need to read those lines at all.

Full error:

File "C:\Users\brady\Desktop\Slider_All\Slide-Mobile.py", line 57, in module

xpoints, ypoints, gradient = np.loadtxt(IT.islice(f,68, 500), delimiter=',', unpack=True) File "C:\Users\brady\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\numpy\lib\npyio.py", line 1159, in loadtxt for x in read_data(_loadtxt_chunksize): File "C:\Users\brady\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\numpy\lib\npyio.py", line 1075, in read_data for i, line in enumerate(line_iter): File "C:\Users\brady\AppData\Local\Programs\Python\Python38-32\lib\encodings\c p1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 7758: cha racter maps to undefined

Does itertools.islice or numpy.loadtxt possibly attempt to read the whole document first before it takes the slice and runs into a problem or is this something else entirely that I'm missing. I will post my entire unedited code below for completions sake, thankyou for any and all help.

import matplotlib.animation as animation
from matplotlib.widgets import Slider, Button
import matplotlib as mpl
from matplotlib import pyplot as plt
import scipy.interpolate as inter
import numpy as np
import itertools as IT
from itertools import cycle
from scipy.interpolate import interp1d
import os


file = 'example.dat'
x1 = 250    #Lower bound    Bigger bound leads to lots of lag
x2 = 300    #Upper bound    Recommended to remain close to range of 50

#=========================================================================================================================
start = []  #Stores data before given points
end = []    #Stores data after given points
files = []  #Stores text file to be removed when done

#This function decodes and re-encodes the dat files 
class Decoder:
    def decode(fn_in, fn_out):
        CIPHER = cycle([0b01011100, 0b00100111, 0b10111010, 0b01111011, 0b11110010, 0b00110010, 0b10100101])
        with open(fn_in, 'rb') as fin, open(fn_out, 'wb') as fout:        
            fout.write(fin.read(14))
            byte = fin.read(1)
            while byte:
                fout.write( ( int.from_bytes(byte, 'big') ^ next(CIPHER) ).to_bytes(1, 'big') )
                byte = fin.read(1)

    def to_txt(filename):
        #global files
        if filename [-3:] == "dat":        
            Decoder.decode( filename, filename[:-3] + "txt" )
            filename = filename[:-3] + "txt"    
        else:
            print("Extension not recognised for input filename \""+str(filename)+"\", skipping...")

        return filename

    def to_dat(filename):
        files.append(filename)
        if filename [-3:] == "txt":    
                Decoder.decode( filename, tempfile[:-3]+ "dat" )   
                #file.append(filename[:-3] + "dat")      
        else:
            print("Extension not recognised for input filename \""+str(filename)+"\", skipping...")

if file[-3:] == "dat":
    file = Decoder.to_txt(file) #Converts .dat to .txt
    files.append(file)

#Gets all data points from file
with open (file) as f:  
    xpoints, ypoints, gradient = np.loadtxt(IT.islice(f,68, 584), delimiter=',', unpack=True)


#get a list of points to fit a spline to as well
xmin = min(xpoints) 
xmax = max(xpoints) 

#Calculates which lines of data are required to plot
X1 = int(516*((x1 - xmin)/(xmax-xmin))) + 68
X2 = int(516*((x2 - xmin)/(xmax-xmin))) + 68

#Gets specific lines and saves the rest to copy back later
with open (file) as f:
    xp, ypoints, gradient = np.loadtxt(IT.islice(f,X1, X2), delimiter=',', unpack=True)
with open(file) as f:
    for line in IT.islice(f,0,X1):
        start.append(line)
with open (file) as f:
    for line in IT.islice(f,X2,584):
        end.append(line)

#Sets amount of data points to plot, must be multiple of point range
#The lower the number the more accurate the plot but the slower it will run 
N = len(xp)

if N < 200:
    j = 1                   
elif N < 400:
    j = 1
else: j = 1 

x = xp[::j]
yvals = ypoints[::j]
N = len(x)
xnew = xp

#spline fit
spline = inter.InterpolatedUnivariateSpline (x, yvals)

#set up a plot
fig,axes = plt.subplots(1,1,figsize=(12.0,4.0),sharex=True)
fig,axes.set_position([0.05,0.08,0.93,0.80])
ax1 = axes

pind = None #active point
epsilon = 5 #max pixel distance
#Updates plot when point is dragged
def update(val):
    global yvals
    global spline
    # update curve
    for i in np.arange(N):
      yvals[i] = sliders[i].val 
    l.set_ydata(yvals)
    spline = inter.InterpolatedUnivariateSpline (x, yvals)
    m.set_ydata(spline(X))
    # redraw canvas while idle
    fig.canvas.draw_idle()

#Resets plot back to original save from when opened
def reset(event):
    global yvals
    global spline
    #reset the values
    yvals = ypoints
    for i in np.arange(N):
      sliders[i].reset()
    spline = inter.InterpolatedUnivariateSpline (x, yvals)
    l.set_ydata(yvals)
    m.set_ydata(spline(X))
    # redraw canvas while idle
    fig.canvas.draw_idle()

#Overwirtes current save with new plot
def save(event):
    f = interp1d(x, yvals, kind='cubic')
    ynew = f(xnew)
    ax1.plot(xnew,ynew)

    newfile = np.vstack((xnew,ynew, gradient)).T

    with open(file, 'w') as f:
        for item in start:
            f.write("%s" % item)
        np.savetxt(f, newfile, delimiter = ',')
        for item in end:
            f.write("%s" % item)
        #f.write('""')
    Decoder.to_dat(file) #Converts .txt to .dat

#Event handler for mouse click
def button_press_callback(event):
    'whenever a mouse button is pressed'
    global pind
    if event.inaxes is None:
        return
    if event.button != 1:
        return
    #print(pind)
    pind = get_ind_under_point(event)    

#Event handler for mouse release
def button_release_callback(event):
    'whenever a mouse button is released'
    global pind
    if event.button != 1:
        return
    pind = None

#Gets clicked point number
def get_ind_under_point(event):
    'get the index of the vertex under point if within epsilon tolerance'

    # display coords
    #print('display x is: {0}; display y is: {1}'.format(event.x,event.y))
    t = ax1.transData.inverted()
    tinv = ax1.transData 
    xy = t.transform([event.x,event.y])
    #print('data x is: {0}; data y is: {1}'.format(xy[0],xy[1]))
    xr = np.reshape(x,(np.shape(x)[0],1))
    yr = np.reshape(yvals,(np.shape(yvals)[0],1))
    xy_vals = np.append(xr,yr,1)
    xyt = tinv.transform(xy_vals)
    xt, yt = xyt[:, 0], xyt[:, 1]
    d = np.hypot(xt - event.x, yt - event.y)
    indseq, = np.nonzero(d == d.min())
    ind = indseq[0]

    #print(d[ind])
    if d[ind] >= epsilon:
        ind = None

    #print(ind)
    return ind

#Event handler for mosue movement
def motion_notify_callback(event):
    'on mouse movement'
    global yvals
    if pind is None:
        return
    if event.inaxes is None:
        return
    if event.button != 1:
        return

    #update yvals
    #print('motion x: {0}; y: {1}'.format(event.xdata,event.ydata))
    yvals[pind] = event.ydata 

    # update curve via sliders and draw
    sliders[pind].set_val(yvals[pind])
    fig.canvas.draw_idle()



X = xp
ax1.plot (X, ypoints, 'k--', label='original')
l, = ax1.plot (x,yvals,color='k',linestyle='none',marker='o',markersize=8)
m, = ax1.plot (X, spline(X), 'r-', label='spline')

if max(ypoints) > 0:
    yheight = 0.01*max(ypoints)
    ylower =0
else: 
    yheight = -0.1*max(ypoints)
    ylower = yheight    

ax1.set_yscale('linear')
ax1.set_xlim(x1, x2)
ax1.set_ylim(min(ypoints)-ylower,max(ypoints)+yheight)
ax1.grid(True)
ax1.yaxis.grid(True,which='minor',linestyle='--')


sliders = []

for i in np.arange(N):

    axamp = plt.axes([0.84, -1, 0.12, 0.01])
    # Slider
    s = Slider(axamp, 'p{0}'.format(i), -100, 10, valinit=yvals[i])
    sliders.append(s)


for i in np.arange(N):
    #samp.on_changed(update_slider)
    sliders[i].on_changed(update)

axres = plt.axes([0.84, 0.90, 0.15, 0.08])
bres = Button(axres, 'Reset')
bres.on_clicked(reset)

axsave = plt.axes([0.68, 0.90, 0.15, 0.08])
bsave = Button(axsave, 'Save')
bsave.on_clicked(save)


fig.canvas.mpl_connect('button_press_event', button_press_callback)
fig.canvas.mpl_connect('button_release_event', button_release_callback)
fig.canvas.mpl_connect('motion_notify_event', motion_notify_callback)

plt.show()
for filename in files:
    os.remove(filename)

EDIT: I know believe the error is almost definitely tied to the itertools.islice command as I have found a similar issue here: Python 3 itertools.islice continue despite UnicodeDecodeError.
Currently researching alternate way to potentially open the file as changing decode style for .dat is not possible at this stage

Brady Gale
  • 33
  • 6
  • Please post the full error message. "Does `itertools.islice` or `numpy.loadtxt` possibly attempt to read the whole document first before it takes the slice" - no, `numpy.loadtxt` doesn't even know anything about this file because it's being fed a generator. Also note that [this generator must yield `bytes` in Python 3](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html). – ForceBru Feb 18 '20 at 21:52
  • Alright, have added the full error message to my original post. Thankyou for this info, not sure what is the issue now then will continue looking – Brady Gale Feb 18 '20 at 22:03

1 Answers1

0

I have solved the issue using the solution posted here: https://stackoverflow.com/a/31113251/10475989

My final code is:

types_of_encoding = ["utf8", "cp1252"]
for encoding_type in types_of_encoding:
    with open (file, 'r', encoding = encoding_type, errors='ignore') as f:
        xpoints, ypoints, gradient = np.loadtxt(IT.islice(f,65, 582), delimiter=',', unpack=True)
Brady Gale
  • 33
  • 6