0

Hello I am very new to programming, the most I know is basic HTML.

I'm trying to section text into 256 character portions. From what I learned I should use

inFile = open('words.txt', 'r')

to open a text file

contents = inFile.read()
print(contents)

then I should use

str1 = file.read(256)

to group this text.

But I do not understand how to use these two.

icedwater
  • 4,701
  • 3
  • 35
  • 50
  • 1
    possible duplicate of [Lazy Method for Reading Big File in Python?](http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) – Dan Loewenherz Apr 17 '14 at 01:55
  • Run a python shell, then look at `help(open)`, etc ... every function is documented. Don't worry if you don't understand it all, pick it up some at a time. – icedwater Apr 17 '14 at 03:54
  • ok, but I'm so close! a user actually got me near to the end. – user3543478 Apr 17 '14 at 03:59

3 Answers3

3

The .read method reads a given number of bytes, or the entire file if no number is specified. To split by characters rather than bytes, you should read the whole file and then chunk them up yourself. Example:

# This is just a convenience so you don't have to worry about closing the file
with open('words.txt', 'r') as inFile:
    # Read the file
    contents = inFile.read()
    # This will store the different 256 character bits
    groups = []
    # while the contents contain something
    while contents:
        # Add the first 256 characters to the grouping
        groups.append(contents[:256])
        # Set the contents to everything after the first 256
        contents = contents[256:]
   print(groups)
Tim Brown
  • 3,173
  • 1
  • 18
  • 15
  • could I see a video on how to apply this? – user3543478 Apr 17 '14 at 02:17
  • You should be able to copy / paste, and just change the `'words.txt'` part to your actual filename. – Tim Brown Apr 17 '14 at 02:43
  • I don't understand, what is with the "while contest:" part and after it? including the # areas? – user3543478 Apr 17 '14 at 03:35
  • A while loop does whatever is in it while the condition is truthy. In Python, a string is truthy until it's an empty string (""). Above, it's basically saying "While the contents are a string, do this block", and the block keeps changing the contents to be smaller and smaller until it's an empty string. The lines starting with a # are just comments, so they won't be executed as code, they are just there to help explain. – Tim Brown Apr 17 '14 at 03:38
  • SyntaxError: EOL while scanning string literal – user3543478 Apr 17 '14 at 03:46
  • ^ when I used the first line it highlighted "infile" – user3543478 Apr 17 '14 at 03:46
  • I forgot a colon. Edited to add it. – Tim Brown Apr 17 '14 at 03:48
1

Alternatively, using list comprehension

with open('words.txt', 'r') as inFile:
    groups = [group for group in iter(lambda: inFile.read(256), '')]

UPDATE

If the words.txt contains non-ascii code and it is utf-8 encoded.

import codecs
with codecs.open('words.txt', 'r', 'utf-8') as inFile:
    groups = [group for group in iter(lambda: inFile.read(256), '')]
emesday
  • 6,078
  • 3
  • 29
  • 46
0

I think people need to be more kind for those who are new to programming.

inFile = open('words.txt', 'r')
contents = inFile.read() #Read the file from HDD and Set the whole content to MEMORY.

Now contents has all characters in words.txt.

You can get first 256 characters like this.

str1 = contents[:256]    #Slice

You can get second 256 character like this.

str2 = contents[256:512] #Slice
Kei Minagawa
  • 4,395
  • 3
  • 25
  • 43