How to use loop in Python to extract words (second and third in the line) from a txt file

Question

I have several txt files, that contain first and last name of the authors. Here are two examples among about thirty (that do not contain the same number of authors).

authors1.txt

AU  - Jordan, M. 
AU  - Thomson, J.J.  
AU  - Einstein, A.  
AU  - Tesla, N.

authors3.txt

AU  - Agassi, A.
AU  - Herbert, P.H.
AU  - Agut, R.B.

I want to extract the last and first name of the authors for each file. Since I am a beginner in Python, I wrote a script (more or less suitable).

with open('authors3.txt', 'rb') as f:
    textfile_temp = f.read()

#o_author1 
o_author1 = textfile_temp.split('AU  - ')[1]
L_name1  = o_author1.split(",")[0]
F_name1  = o_author1.split(",")[1]
print(L_name1)
print(F_name1)

#o_author2 
o_author2 = textfile_temp.split('AU  - ')[2]
L_name2  = o_author2.split(",")[0]
F_name2  = o_author2.split(",")[1]
print(L_name2)
print(F_name2)

#o_author3 
o_author3 = textfile_temp.split('AU  - ')[3]
L_name3  = o_author3.split(",")[0]
F_name3  = o_author3.split(",")[1]
print(L_name3)
print(F_name3)

my result is:

Agassi
 A.

Herbert
 P.H.

Agut
 R.B.

My question: Is it possible to write a script with a loop, knowing that the files authors#.txt, don't each contain the same number of authors?

unless I'm mistaken, wouldn't a [simple for loop](https://stackoverflow.com/questions/8009882/how-to-a-read-large-file-line-by-line-in-python) work? — Sean Breckenridge, May 18 '18 at 07:51
What is the problem? You just don't know how many lines are in the file? Or maybe you want to iterate 3 files together and they have different sizes? — Attersson, May 18 '18 at 08:00
Yes I do not know how many lines exist in the authors.txt file. each file has different authors. — ahmed redha, May 18 '18 at 08:06
I want to iterate n (30 files) files together with different sizes — ahmed redha, May 18 '18 at 08:12

score 3 · Answer 1 · answered May 18 '18 at 07:54

Using a simple for-loop

Demo:

authors_firstName = []
authors_lastName = []
with open(filename, "r") as infile:
    for i in infile.readlines():
        val = i.strip().split("-")[-1].strip().split(",")   #str.strip to remove any leading or trailing space, split by "-"
        authors_firstName.append(val[0])
        authors_lastName.append(val[1])
print(authors_firstName)
print(authors_lastName)

Output:

['Jordan', 'Thomson', 'Einstein', 'Tesla', 'Agassi', 'Herbert', 'Agut']
[' M.', ' J.J.', ' A.', ' N.', ' A.', ' P.H.', ' R.B.']

score 1 · Answer 2 · answered May 18 '18 at 07:56

I suggest you read your file line by line, let's say,

with open('authors1.txt', 'rb') as f:
    lines = f.readlines()

# lines = ["AU - Jordan, M.", "AU - Thomson, J.J.", "AU - Einstein, A.", "AU  - Tesla, N."]

for line in lines:
    o_author1 = line.split('AU  - ')[1]
    L_name1  = o_author1.split(",")[0]
    F_name1  = o_author1.split(",")[1]
    print(L_name1)
    print(F_name1)

Jordan
 M.
Thomson
 J.J.
Einstein
 A.
Tesla
 N.

score 1 · Accepted Answer · answered May 18 '18 at 07:58

You can fetch the files in your current (or any other) directory by using os.listdir() or os.walk(). After you've obtained a list of author text files, you can simply loop through them with a simple for loop.

Hint: for-looping over a file object will yield you one line at a time, until it reaches end of the file - this is also memory efficient, as it only reads one line at a time to memory, instead of loading the entire file contents to your memory.

If you abstract your author name getting to a function, you can then simplify your code to something like this:

import os

def get_author(line):
    name = line.strip().split('AU  - ')[1]
    firstname, lastname = name.split(',')
    return firstname, lastname

if __name__ == '__main__':
    files = [f for f in os.listdir('.') if os.path.isfile(f)]
    # You probably want a more fancy way of detecting author files
    files = [f for f in files if f.startswith('authors') and f.endswith('.txt')]

    authors = []
    for file in files:
        with open(file, 'r') as fd:
            for line in fd:
                authors.append(get_author(line))
    print(authors)

authors at the end of the script will be a list containing tuples - each tuple consisting of the first and last name of your author.

score 0 · Answer 4 · answered May 18 '18 at 07:55

I'm a bit rough on my Python, so I'll give you some pseudocode:

lines = file.ReadAll()

for line in lines
    parts = line.split("-,")
    print parts[1], parts[2]

And that's it. Read the entire file into a variable, iterate over each line and extract the parts.

Or, basically do what @Rakesh suggested =)

How to use loop in Python to extract words (second and third in the line) from a txt file

4 Answers4