Break text into 256-character chunks

Question

Hello I am very new to programming, the most I know is basic HTML.

I'm trying to section text into 256 character portions. From what I learned I should use

inFile = open('words.txt', 'r')

to open a text file

contents = inFile.read()
print(contents)

then I should use

str1 = file.read(256)

to group this text.

But I do not understand how to use these two.

possible duplicate of [Lazy Method for Reading Big File in Python?](http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) — Dan Loewenherz, Apr 17 '14 at 01:55
Run a python shell, then look at `help(open)`, etc ... every function is documented. Don't worry if you don't understand it all, pick it up some at a time. — icedwater, Apr 17 '14 at 03:54
ok, but I'm so close! a user actually got me near to the end. — user3543478, Apr 17 '14 at 03:59

Tim Brown · Answer 1 · 2014-04-17T03:50:40.397

3

The .read method reads a given number of bytes, or the entire file if no number is specified. To split by characters rather than bytes, you should read the whole file and then chunk them up yourself. Example:

# This is just a convenience so you don't have to worry about closing the file
with open('words.txt', 'r') as inFile:
    # Read the file
    contents = inFile.read()
    # This will store the different 256 character bits
    groups = []
    # while the contents contain something
    while contents:
        # Add the first 256 characters to the grouping
        groups.append(contents[:256])
        # Set the contents to everything after the first 256
        contents = contents[256:]
   print(groups)

edited Apr 17 '14 at 03:50

answered Apr 17 '14 at 01:45

Tim Brown

3,173
1
18
15

could I see a video on how to apply this? – user3543478 Apr 17 '14 at 02:17
You should be able to copy / paste, and just change the `'words.txt'` part to your actual filename. – Tim Brown Apr 17 '14 at 02:43
I don't understand, what is with the "while contest:" part and after it? including the # areas? – user3543478 Apr 17 '14 at 03:35
A while loop does whatever is in it while the condition is truthy. In Python, a string is truthy until it's an empty string (""). Above, it's basically saying "While the contents are a string, do this block", and the block keeps changing the contents to be smaller and smaller until it's an empty string. The lines starting with a # are just comments, so they won't be executed as code, they are just there to help explain. – Tim Brown Apr 17 '14 at 03:38
SyntaxError: EOL while scanning string literal – user3543478 Apr 17 '14 at 03:46
^ when I used the first line it highlighted "infile" – user3543478 Apr 17 '14 at 03:46
I forgot a colon. Edited to add it. – Tim Brown Apr 17 '14 at 03:48

emesday · Answer 2 · 2014-04-17T03:17:56.510

1

Alternatively, using list comprehension

with open('words.txt', 'r') as inFile:
    groups = [group for group in iter(lambda: inFile.read(256), '')]

UPDATE

If the words.txt contains non-ascii code and it is utf-8 encoded.

import codecs
with codecs.open('words.txt', 'r', 'utf-8') as inFile:
    groups = [group for group in iter(lambda: inFile.read(256), '')]

edited Apr 17 '14 at 03:17

answered Apr 17 '14 at 01:58

emesday

6,078
3
29
46

That is the ideal solution, but that will read 256 bytes at a time, and if the file has non-ascii characters it'll get funny. – Tim Brown Apr 17 '14 at 02:03
Right. I didn't know that. I suggest just one of solutions. – emesday Apr 17 '14 at 02:08
@TimBrown I tested your code and it has same problem. – emesday Apr 17 '14 at 02:18
Your answer is better in terms of easy to read, easy to understand. – emesday Apr 17 '14 at 02:40
Because the result is in groups. `print groups` to see the result. – emesday Apr 17 '14 at 05:53

score 0 · Answer 3 · answered Apr 17 '14 at 02:46

0

I think people need to be more kind for those who are new to programming.

inFile = open('words.txt', 'r')
contents = inFile.read() #Read the file from HDD and Set the whole content to MEMORY.

Now contents has all characters in words.txt.

You can get first 256 characters like this.

str1 = contents[:256]    #Slice

You can get second 256 character like this.

str2 = contents[256:512] #Slice

answered Apr 17 '14 at 02:46

Kei Minagawa

4,395
3
25
43

though I don't see anything happening :/ – user3543478 Apr 17 '14 at 03:41
@user3543478: Just try `print str1`. – Kei Minagawa Apr 17 '14 at 03:48
your method works! I just need more thing. I can see str1 but how do I make str2 come up? – user3543478 Apr 17 '14 at 03:58
@user3543478: OK. Then try `print len(str1)` and `print len(str2)`. It print out length of the string. – Kei Minagawa Apr 17 '14 at 04:17

Break text into 256-character chunks

3 Answers3