1

Here is the context for the question: I have a .txt file that contains verses of scripture line by line. Each line contains a distinct number of words. Anyway, is there a way to take the first 1000 words the file, create a distinct file (like for instance Block 1) and input the information into that file, and then creating another one with the next 1000 words from which the first 1000 words were taken and so on and so forth, while also ignoring the numbers of chapters? A response would be greatly appreciated since I am doing this for a person statistical project.

  • 3
    you can read the text and split it into words https://stackoverflow.com/questions/16922214/reading-a-text-file-and-splitting-it-into-single-words-in-python and then just selecting the first x number of words. As a new user, make sure to look around first – Areza Dec 09 '19 at 23:11

2 Answers2

2

This should work:

from string import ascii_letters

with open( 'scripture.txt' ) as fin :
    text = fin.read()

valid_characters = ascii_letters + '\n\t '
text = ''.join( t for t in text if t in valid_characters )
text = text.split()

for i in range(len(text)//1000) :
    with open( 'part_%03d.txt' % i, 'w') as fout :
        thousand_words = text[i*1000:min((i+1)*1000,len(text))]
        fout.write( ' '.join( thousand_words ))
lenik
  • 23,228
  • 4
  • 34
  • 43
1
with open('scripture_verses.txt') as f:
    words = []
    i = 0
    for line in f:
        for word in line.split():
            words.append(word)
            i += 1
            if i % 1000 == 0:
                with open('out{}.txt'.format(i // 1000), 'w') as out:
                    print(' '.join(words), file=out)
                words = []
    else:
        with open('out{}.txt'.format(i // 1000 + 1), 'w') as out:
            print(' '.join(words), file=out)
        words = []
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108
  • This one suffers the same problem as the other one. It is including the numbers in it too. Is there a way for it to ignore numbers? Such that "1:1" or "1:21" are ignored? – S. R. Colledge Dec 10 '19 at 03:43