0

I am trying to extract certain string of data from a text file. The code I use is the following. I want to read the particular string(all actions) from that text file and then store it in an array or list if it is found. and then display in the same order.

import string 

solution_path = "/homer/my_dir/solution_detail.txt"
solution = open(solution_path).read()

all_actions = ['company_name','email_address','full_name']
n = 0
sequence_array = []
for line in solution:
            for action in all_actions:
                    if action in line:
                        sequence_array[n] = action
                        n = n+1
for x in range(len(sequence_array)):
            print (sequence_array[x])

But this code does not do anything but runs without any error.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Sam
  • 15
  • 5

2 Answers2

1

There are multiple problems with the code.

  1. .read() on a file produces a single string. As a result, for line in solution: iterates over each character of the file's text, not over each line. (The name line is not special, in case you thought it was. The iteration depends only on what is being iterated over.) The natural way to get lines from the file is to loop over the file itself, while it is open. To keep the file open and make sure it closes properly, we use a with block.

  2. You may not simply assign to sequence_array[n] unless the list is already at least n+1 elements long. (The reason you don't get an error from this is because if action in line: is never true, because of the first point.) Fortunately, we can simply .append to the end of the list instead.

  3. If the line contains multiple of the all_actions, it would be stored multiple times. This is probably not what you want to happen. The built-in any function makes it easier to deal with this problem; we can supply it with a generator expression for an elegant solution. But if your exact needs are different then of course there are different approaches.

  4. While the last loop is okay in theory, it is better to loop directly, the same way you attempt to loop over solution. But instead of building up a list, we could instead just print the results as they are found.

So, for example:

with open(solution_path) as solution:
    for line in solution:
        if any(action in line for action in all_actions):
            print(line)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • Thanks a lot for the detailed explanation and clarification. Your code also works perfect. – Sam Apr 08 '20 at 23:09
0

What is happening is that solution contains all the text inside the file. Therefore when you are iterating for line in solution you are actually iterating over each and every character separately, which is why you never get any hits.

try the following code (I can't test it since I don't have you're file)

solution_path = "/homer/my_dir/solution_detail.txt"
all_actions = ['company_name','email_address','full_name']


sequence_array = []
with open(solution_path, 'r') as f:
    for line in f.readlines():
        for action in all_actions:
            if action in line:
                sequence_array.append(action)

This will collect all the actions in the documents. if you want to print all of them

for action in sequence_array:
    print(action)