-2

I'm trying to parse a file. I have a file name kjv.tsv. Inside this file each line include book name, chapter name, verse number, and verse text.

The output should look like this:

(ge,   0,    0,    In the beginning God created the heaven and the earth.)
(ge,   0,    1,    And the earth was .... upon the face of the waters.)
(ge,   0,    2,    And God said, Let there be light: and there was light.)

This is what I got so far. My function name is line I'm defining parse_line:

def parse_line(line):
    '''
    Converts a line from kjv.tsv into a list of verse information. I.e.
    [book name, chapter number, verse number, verse text]
    Return a list of verse information
    '''
    bibletext = open("kjv.tsv" , "r").readlines()

    bible = {}
    for line in bibletext.splitlines():
        number, bv, contents = line.split(" | ")
        book, verse = bv.strip().split(" ")
        print (book)
        print (bible)
        if book in bible:
            bible[book].append([verse,contents])
        else:
            bible[book] = [verse,contents]

    print (bible)
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • 1
    Does it not work? How do you know? What do you expect it to do? See how to create a [mcve]. – Peter Wood Dec 02 '16 at 23:14
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your code and accurately describe the problem. Note that you have indentation errors and no sample data file. – Prune Dec 02 '16 at 23:30
  • Code doesn't work. It's not returning the correct information back in pkython. The example that I proved is what I'm look for. – Nichols Sullivan Dec 03 '16 at 05:26
  • 1
    In what way does it not work? What output do you get? We don't have your input files, we can't run the code. The code isn't indented correctly (which can make a difference in Python). – Peter Wood Dec 03 '16 at 09:25
  • You can use the [csv](https://docs.python.org/3/library/csv.html) module from the standard library to parse tabular data files such as this one. – Håken Lid Dec 03 '16 at 09:35

1 Answers1

1

This is way easier than that in python. You can use a for loop to go through every line in your file and split on the first 3 commas.

bible = []

with open('kjv.tsv') as f:
    for line in f:
        bible.append(line.split(',', 3))

print(bible)

To learn more checkout: Information on why to use a with statement and looping through lines in a file with a for loop and how split works
Note that this is not an answer to the title of this question. The code above matches what you said in the body of your question. The body of your question asked to build a list, so the above code builds a list. In order to build a dictionary you need to decide what you wish the key and value to be. You will retrieve the value with the key. What you could do is retrieve the verse text with something like:

bible['John']['11']['35']
>>> 'Jesus wept'

Just add this code to the end of the code above:

bible_dict = {}
for book, chapter, verse, text in bible:
    if not bible_dict.get(book):
        bible_dict[book] = {}
    if not bible_dict[book].get(chapter):
        bible_dict[book][chapter] = {}
    if not bible_dict[book][chapter].get(verse):
        bible_dict[book][chapter][verse] = text

The above code checks if the book is in the dict. If it is, it then checks if the chapter is in the book. If it is, then it checks if the verse is in the book. If the verse is not in the book, it will add it. The script will add any of the items (book, chapter or verse) if they are missing. It will do this for every line in the file.

The first half of the script turns the file into a list of lists with each line being a list of book, chapter, verse and text.
The second half of the script turns the list of lists into a dict of dicts where the book is a dict, each chapter is a dict, each verse is a key and every text is a value.
Please let me know if you need more clarification.

Community
  • 1
  • 1
  • Thanks Brandon Keith Biggs. ValueError: need more than 1 value to unpack – Nichols Sullivan Dec 05 '16 at 01:02
  • for book, chapter, verse, text in bible: – Nichols Sullivan Dec 05 '16 at 01:03
  • This means that the csv is created differently than first thought. Do the first half of the script and print bible[0]. What shows up? Is it in the book, chapter, verse, text order? Or is it missing some? Also check bible[1] and see if it is the same. – Brandon Keith Biggs Dec 06 '16 at 07:30
  • What the book, chapter, verse, text in the for loop does is book = bible[0][0] chapter = bible[0][1] verse = bible[0][2] text = bible[0][3]. Because doing the above is really tedious, you can just write for book, chapter, verse, text in bible and if all the lists in bible are the same size (which they should be if taken from a csv file), then you can just use those variables in your for loop. – Brandon Keith Biggs Dec 06 '16 at 07:31