3

I have a keywords.txt file like this:

    #section1
    keyword1
    keyword2
    ......
    #section2
    keyword3
    keyword4
    ......
    #section3
    keyword5
    keyword6
    ......

there are many keywords in each sections and there are many secitions. My question is : How to extract each section into separated list as following output:

    section1=["keyword1","keyword2"]
    section2=["keyword3","keyword4"]
    ......

This is what I have done, to extract the line number of the separator "#"

separator_numlist=[]
with open("keywords.txt") as f:
    for num,line in enumerate(f):
        if('#') in line:
            separator_numlist.append()
"""
Then read lines between each separator's line number
"""

Is there a better solution? Also I'm thinking to store these keywords in XML or json, perhaps reading sections from structured files are more efficiency than reading from txt file.

will
  • 31
  • 3

2 Answers2

3

you can use dict:

dic = dict()
with open('output', 'r') as f:
    for i in f.readlines():
        if i.startswith('#'):
            my_key = i.replace("#", "")
            dic_key = my_key.strip()
        else:
            if dic_key in dic:
                dic[dic_key] += [i.strip()]
            else:
                dic[dic_key] = [i.strip()]

Output:

{'section1': ['keyword1', 'keyword2'], 'section2': ['keyword3', 'keyword4'], 'section3': ['keyword5', 'keyword6']}

you can also importing json and use this to convert it:

json_output = json.dumps(dic)
LinPy
  • 16,987
  • 4
  • 43
  • 57
1

Like LinPy I'd suggest a dict too:

with open( "split.txt" ) as fpntr:
    data = fpntr.read()

out = {
    y[0] : y[1::] for y in [ x.split() for x in data.split('#') if x] 
    }

print out

gives

{'section3': ['keyword5', 'keyword6'], 'section2': ['keyword3', 'keyword4'], 'section1': ['keyword1', 'keyword2']}

The if x is there to eliminate empty stings.

mikuszefski
  • 3,943
  • 1
  • 25
  • 38
  • [Using list comprehensions instead of for-loops for side-effects, and dropping the list, is bad style](https://stackoverflow.com/questions/17957181/when-to-drop-list-comprehension-and-the-pythonic-way). There is nothing wrong with for-loops. I removed the downvote because it actually solves the issue. – Jan Christoph Terasa Nov 05 '19 at 07:56
  • @JanChristophTerasa [here](https://hashnode.com/post/when-should-you-not-use-list-comprehensions-in-python-cj2a35qsh0020u9532z9e1n6z) section 1 refers to your "side effects" and directly gives an example where those "side effects" are actually a meaningful purpose. I consider my code as one of those exceptions, especially when looking at all the `if else` statements in the `for` loop solution – mikuszefski Nov 05 '19 at 08:31
  • It's not only bad style, but can also lead to less performant code, because depending on the size of problem you are creating a lot of memory overhead for nothing. The new code using a dictionary comprehension is much nicer, cleaner, and more concise. – Jan Christoph Terasa Nov 05 '19 at 08:42
  • @JanChristophTerasa Agreed, I wouldn't use the code for gigabyte data files....but then---to be honest---I wouldn't use Python either. – mikuszefski Nov 05 '19 at 08:48