1

I have several txt files, that contain first and last name of the authors. Here are two examples among about thirty (that do not contain the same number of authors).

authors1.txt

AU  - Jordan, M. 
AU  - Thomson, J.J.  
AU  - Einstein, A.  
AU  - Tesla, N.

authors3.txt

AU  - Agassi, A.
AU  - Herbert, P.H.
AU  - Agut, R.B. 

I want to extract the last and first name of the authors for each file. Since I am a beginner in Python, I wrote a script (more or less suitable).

with open('authors3.txt', 'rb') as f:
    textfile_temp = f.read()

#o_author1 
o_author1 = textfile_temp.split('AU  - ')[1]
L_name1  = o_author1.split(",")[0]
F_name1  = o_author1.split(",")[1]
print(L_name1)
print(F_name1)

#o_author2 
o_author2 = textfile_temp.split('AU  - ')[2]
L_name2  = o_author2.split(",")[0]
F_name2  = o_author2.split(",")[1]
print(L_name2)
print(F_name2)

#o_author3 
o_author3 = textfile_temp.split('AU  - ')[3]
L_name3  = o_author3.split(",")[0]
F_name3  = o_author3.split(",")[1]
print(L_name3)
print(F_name3)

my result is:

Agassi
 A.

Herbert
 P.H.

Agut
 R.B.

My question: Is it possible to write a script with a loop, knowing that the files authors#.txt, don't each contain the same number of authors?

Aimery
  • 1,559
  • 1
  • 19
  • 24

4 Answers4

3

Using a simple for-loop

Demo:

authors_firstName = []
authors_lastName = []
with open(filename, "r") as infile:
    for i in infile.readlines():
        val = i.strip().split("-")[-1].strip().split(",")   #str.strip to remove any leading or trailing space, split by "-"
        authors_firstName.append(val[0])
        authors_lastName.append(val[1])
print(authors_firstName)
print(authors_lastName)

Output:

['Jordan', 'Thomson', 'Einstein', 'Tesla', 'Agassi', 'Herbert', 'Agut']
[' M.', ' J.J.', ' A.', ' N.', ' A.', ' P.H.', ' R.B.']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

I suggest you read your file line by line, let's say,

with open('authors1.txt', 'rb') as f:
    lines = f.readlines()

# lines = ["AU - Jordan, M.", "AU - Thomson, J.J.", "AU - Einstein, A.", "AU  - Tesla, N."]

for line in lines:
    o_author1 = line.split('AU  - ')[1]
    L_name1  = o_author1.split(",")[0]
    F_name1  = o_author1.split(",")[1]
    print(L_name1)
    print(F_name1)

Jordan
 M.
Thomson
 J.J.
Einstein
 A.
Tesla
 N.
BcK
  • 2,548
  • 1
  • 13
  • 27
1

You can fetch the files in your current (or any other) directory by using os.listdir() or os.walk(). After you've obtained a list of author text files, you can simply loop through them with a simple for loop.

Hint: for-looping over a file object will yield you one line at a time, until it reaches end of the file - this is also memory efficient, as it only reads one line at a time to memory, instead of loading the entire file contents to your memory.

If you abstract your author name getting to a function, you can then simplify your code to something like this:

import os

def get_author(line):
    name = line.strip().split('AU  - ')[1]
    firstname, lastname = name.split(',')
    return firstname, lastname

if __name__ == '__main__':
    files = [f for f in os.listdir('.') if os.path.isfile(f)]
    # You probably want a more fancy way of detecting author files
    files = [f for f in files if f.startswith('authors') and f.endswith('.txt')]

    authors = []
    for file in files:
        with open(file, 'r') as fd:
            for line in fd:
                authors.append(get_author(line))
    print(authors)

authors at the end of the script will be a list containing tuples - each tuple consisting of the first and last name of your author.

samu
  • 2,870
  • 16
  • 28
0

I'm a bit rough on my Python, so I'll give you some pseudocode:

lines = file.ReadAll()

for line in lines
    parts = line.split("-,")
    print parts[1], parts[2]

And that's it. Read the entire file into a variable, iterate over each line and extract the parts.

Or, basically do what @Rakesh suggested =)

Immersive
  • 1,684
  • 1
  • 11
  • 9