How to manage list indici when processing text of inditerminate length?

Question

First: this probably seems like it's already been answered, there are similar problems described and answered but (I think) this is substantively different enough to merit asking on here (sorry if I'm wrong). That is why I'm writing a fairly detailed explanation below, sorry to be long winded, I'd rather be too detailed.

I'm trying to process large numbers of .txt files, and on each one go thru, find every instance of a targeted word, and then print the word and 10 words on either side of it into a .csv file for analysis (getting a feel for the context the words are used in).

I want the individual words to each land in their own cell for later analyses. As such, in the .csv handling portion, I have it record a descending list of single indices to the key word, and then single indices ascending away from it, 10 in each direction. Works like charm unless the word I'm targeting is within 10 indici from the start or the end of the document.

If it is, it kicks "IndexError: list index out of range"

I've seen helpful explanations on here for managing how to do this building an index list / interfacing with overrunning Indexing (Python Loop: List Index Out of Range) but my problem is that I need (well, I'd like / hope I'm able) to keep the program requesting the indexing and returning ' ' if it's the beginning or end of the file instead of running into a wall.

* For brevity's sake, here is the chunk of code setting up the indexing and then doing the index querying, they're not actually stacked like this in the code. The parentheticals here may be off by a space, don't think that's pertinent but thought I'd approximate in case I am, as usual, wrong. *

for index in range(len(up_file_split_raw)):
            if keyword.match(up_file_split_raw[index]):
                start = max(0, index-assoc_wrd_range)
                finish = min(len(up_file_split_raw), index+assoc_wrd_range+1)
                assocd_wrd_list = string.join (up_file_split_raw[start:finish])

         Break in Code

                 row_vals_2 = {
                    'Assoc_1':(up_file_split_raw[start:index][0]),
                    'Assoc_2':(up_file_split_raw[start:index][1]),
                    'Assoc_3':(up_file_split_raw[start:index][2]),
                    'Assoc_4':(up_file_split_raw[start:index][3]),
                    'Assoc_5':(up_file_split_raw[start:index][4]),
                    'Assoc_6':(up_file_split_raw[start:index][5]),
                    'Assoc_7':(up_file_split_raw[start:index][6]),
                    'Assoc_8':(up_file_split_raw[start:index][7]),
                    'Assoc_9':(up_file_split_raw[start:index][8]),
                    'Assoc_10':(up_file_split_raw[start:index][9]),
                    'KeyWord':(up_file_split_raw[index]),
                    'Assoc_11':(up_file_split_raw[index+1:finish][0]),
                    'Assoc_12':(up_file_split_raw[index+1:finish][1]),
                    'Assoc_13':(up_file_split_raw[index+1:finish][2]),
                    'Assoc_14':(up_file_split_raw[index+1:finish][3]),
                    'Assoc_15':(up_file_split_raw[index+1:finish][4]),
                    'Assoc_16':(up_file_split_raw[index+1:finish][5]),
                    'Assoc_17':(up_file_split_raw[index+1:finish][6]),
                    'Assoc_18':(up_file_split_raw[index+1:finish][7]),
                    'Assoc_19':(up_file_split_raw[index+1:finish][8]),
                    'Assoc_20':(up_file_split_raw[index+1:finish][9]),
                                 }

As a start, instead of hardcoding every line of the lookups, you might want to consider instead dynamically looking them up in a loop (i.e. `'row_vals_2['Assoc_%d' % idx] = up_file_split_raw[start:index][idx]` within a loop that sets values of `idx`). Then you could dynamically adjust the range of values that `idx` iterates over. — Amber, Jan 22 '19 at 00:06
There's some other things you could do to make your code more readable and less verbose, as well - for instance, instead of typing out `up_file_split_raw[start:index]` each time, compute it once and store it in a variable (say `segment = up_file_split_raw[start:index]`) and then just use `segment[idx]` later on. — Amber, Jan 22 '19 at 00:22

score 1 · Answer 1 · answered Jan 22 '19 at 01:22

1

Use slices they clip to the list index bounds. If x is a list of words x[max(0, i-10):i] is the ten words before i and x[i+1:i+1+10] is the ten words after i.

answered Jan 22 '19 at 01:22

Dan D.

73,243
15
104
123

How to manage list indici when processing text of inditerminate length?

1 Answers1