0

Minimum reproducible example, only goto_index() is being used in my code. The rest is self-explanatory :

import pickle,os

def goto_index(idx_str,src,dest=False) :
    '''Go to index :
       1. Convert 1-based comma seperated digits in idx_str into 0-based list containing each index digit as int.
       2. Starting from current position of src, iterate until index[0] matches current objec's position.
          If matched, try to index the object as given. If not matched, function raises EOFError. If index illegal
          in object function raises IndexError.If object found and index found in object, return value found and
          seek src to begining of object @ index. 
       3. If dest is specified, all values until index will be copied to it from it's current position.
          If element is not found in src, the result will be that all elements from src's current positon
          to EOF are copied to dest.
    '''

    index = [int(subidx)-1 for subidx in idx_str.split(',')]
    val = None
    obj_cnt = -1                              # 0-based count

    try :
        while True :                          # EOFError if index[0] >= EOF point
            obj = pickle.load(src)
            obj_cnt += 1
            if obj_cnt == index[0] :
                val = obj
                for subidx in index[1::] :
                    val = val[subidx]         # IndexError if index illegal
                src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR) # Seek to start of object at index
                return val
            elif dest : pickle.dump(obj,dest)
    except (EOFError,IndexError) : raise      # Caller will handle exceptions

def add_elements(f) :
    pickle.dump('hello world',f)
    pickle.dump('good morning',f)
    pickle.dump('69 420',f)
    pickle.dump('ending !',f)


def get_elements(f) :
    elements = []
    # Actual code similarly calls goto_index() in ascending order of indices, avoiding repeated seeks.
    for idx_str in ('1','2','3') : 
        elements.append(goto_index(idx_str,f))
    return elements

with open("tmp","wb+") as tmp :
    add_elements(tmp)
    print(', '.join(get_elements(tmp)))

    '''Expected output : hello world, good morning, 69 420
       Actual output   : hello world, good morning, ending !
       Issue : When asking for 3rd element, 3rd element skipped, 4th returned, why ?
    '''

EDIT : The issue is with the fact that goto_index() sets obj_cnt to -1 at every call. How to mitigate this ?

user426
  • 213
  • 2
  • 9

1 Answers1

0

The problem was a combination of :

  • obj_cnt was not persistent during function calls so always started from scratch even while the file position was modified in each call, so goto_idx() acted as though it was at BOF but would instead be much ahead.
  • Seeking to start of object at index (src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR)) caused the next read to read the same object it did before - if the previous bug was fixed, this would lead to goto_index() always going to and returning the object at the index from it's first call ever.

I fixed it by a) putting the function in a class where it can access a count variable, b) adding an additional flag fp_set and only seeking back if it is set to a true value, c) providing a reset() method in the class so as to reset obj_cnt to -1 when done with an ordered series of queries.

Keep in mind I am very new to OOP in python and something is probably weird in the below code :

class goto_index:
    obj_cnt = -1 # 0-based count
    
    def sorted(idx_str,src,dest=None,fp_set=False) :
    #Use if going to indexes in ascending order in loop
    # idx_str = comma-seperated index , eg : "7,8" like foo[7][8]
    # src     = file object to search in, from it's current position
    # dest    = if True, will copy all objects until obj @ idx_str found OR EOF
    # fp_set  = if True, will seek such that next read will return obj @ idx_str
    
        index = [int(subidx)-1 for subidx in idx_str.split(',')]
        # Make 0-based int list from 1-based csv string 
        val = None
        try :
            while True :                            # EOFError if not found
                obj = pickle.load(src)
                goto_index.obj_cnt += 1             # increment counter
                if goto_index.obj_cnt == index[0] : # 1st element of index is object number
                    val = obj
                    for subidx in index[1::] :      # Index the object itself
                        val = val[subidx]           # IndexError if illegal index
                    if fp_set : src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR)
                    # Seek back to begining of object in src
                    return val                      # Return value @ index
                elif dest : pickle.dump(obj,dest)   # Copy object to dest
        except (EOFError, IndexError) : raise       # Caller handles these 

    def reset():
        goto_index.obj_cnt = -1

    def random(idx_str,src,dest=None,fp_set=False) :
        goto_index.reset() # Just in case
        src.seek(0)        # Just in case       
        goto_index.sorted(idx_str,src,dest=None,fp_set=False)
        goto_index.reset() # Clear count

Where the question's other function are basically the same except fetch_elements() :

def fetch_elements(f) :
    elements = []
    for idx_str in ('1','2','3') : # Indexes are passed sorted
        elements.append(goto_index.sorted(idx_str,f))
    goto_index.reset()            # Required if using the methods later
    return elements
user426
  • 213
  • 2
  • 9