0

I have a gigantic txt file that I read in and cleaned into a list.
I'm looking for certain words, so I wrote a quick function

def find_words(lines):

    for line in lines:
        if "my words" in line:
            print(line)

which works fine, but how would I write the function so that it prints the word, plus the following next 50 lines or so? Summarizing, I want to find the text that comes after that word.

From then, I would want to create an empty df, and have the function fill in the df with a new row with the word + next 50 rows, every time it found that word.

Stidgeon
  • 2,673
  • 8
  • 20
  • 28
Dasax121
  • 23
  • 8
  • Hey Alex, have you ever heard of regular expression before? You could use regular expression to perform this task. You could use the regular expression library in python: https://docs.python.org/3/library/re.html – Michael Silverstein Nov 26 '19 at 18:21
  • You already showed that you know how to use a `for` loop. Where are you stuck? You loop 50 times, reading and printing. Alternately, set a counter to 50 and check it. Set a flag (boolean) that signals when you're in the state of print the lines. There are many ways to flag what your doing without breaking out of the loop, and several way to do it within a loop. Have you done a flowchart of the steps you want? – Prune Nov 26 '19 at 18:22

3 Answers3

1

Quick & dirty solution:

for i, line in enumerate(lines):
    if "my words" in line:
        print(*lines[i:i+50], sep="\n")
  • enumerate will set i to the index of the current iterated line on the lines array
  • when your desired line is found, you print out a slice of the lines array from the current index, until 50 forward positions.
  • print each line separated by a \n (line break)

If your document has a huge number of lines, you might want to avoid loading all the lines at once in memory (check https://stackoverflow.com/a/48124263/11245195 - but the workaround for your problem might be different).

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
David Lor
  • 78
  • 1
  • 5
  • This did the trick! Thanks so much. Would you know how to make each instance into a new list, or a new row in a pandas df? – Dasax121 Nov 26 '19 at 18:51
  • What do you mean with instance? Each group of 50 consecutive lines? – David Lor Nov 26 '19 at 21:36
  • Sorry for the confusion, but yes! So whenever it comes upon the word "my words" instead of printing that plus the next 50 lines, it takes that plus the 50 next lines and makes it into a list – Dasax121 Nov 26 '19 at 22:11
  • That `lines[i:i+50]` is itself a list, so there you got it, can assign to a variable or whatever you want to do with it – David Lor Nov 27 '19 at 07:42
0

Given Python isn't a requirement, and you can use *nix then this is a one-liner grep task, example:

$ grep "my words" gigantic.txt -A 50

Note: -A = After, -B = before

mkelandis
  • 31
  • 5
  • Assuming OP is using *nix, which they didn't specify – C.Nivs Nov 26 '19 at 19:03
  • @C.Nivs -- you are correct, I should have stated that plainly as well. This doesn't address the question but I didn't have enough rep to comment on the original question at the time. I did think it might be helpful :) Updated to reflect *nix – mkelandis Nov 27 '19 at 15:46
0

i personally prefer to use while loops in general, but i will be attacked if i use one here as i would be adding lines.., but yeah, here is what i suggest:

    def find_words(lines):
        for line in lines:
            if "my words" in line:
                print(line)
                for i in range(1,50):
                    print(line(i))

i am in a bit of a rush, so i have not checked this, but it should work logically speaking...

wondercoll
  • 339
  • 1
  • 4
  • 15