How to print the length of each line in a multiline text file using regex?

Question

I have been given a basic text file and I need to use regex in python to pull all the words for each line and print the number of words per line.

Text File Example:

I have a dog.
She is small and cute,
and likes to play with other dogs.

Example Output:

Line 1: 4
Line 2: 5
Line 3: 7

Any help would be appreciated!

One thing to keep in mind is that the English language is not always this nice. Is _Myers-Briggs_ one word or two? Is _www.website.com_ one word? Word count machines are something where you can get as complicated as you desire. If you'd like to keep it simple, you won't need regex at all, just `str.split()`. — Brad Solomon, Nov 21 '17 at 17:44

Ron · Answer 1 · 2017-11-21T17:55:46.513

0

you can try splitting the lines

with open('input_file_name.txt') as input_file:
line_number = 1
for line in input_file.readlines():
    print( 'Line {} : {}'.format(line_number,len(line.split(' '))))
    line_number +=1

edited Nov 21 '17 at 17:55

answered Nov 21 '17 at 17:49

Ron

197
1
10

score 0 · Answer 2 · answered Nov 21 '17 at 17:49

0

f = open(path_to_text_file, "r") 
counter = 1
for line in f.readlines():  # read the file line by line
    print "Line %d: %d" % (counter, len(line.split(" ")))  # counts the spaces, assuming that there is only one space between words.
    counter += 1

answered Nov 21 '17 at 17:49

Roopak A Nelliat

2,009
3
19
26

Cole Tierney · Answer 3 · 2017-11-21T22:07:06.590

0

You could try awk which splits on runs of white space by default:

cat <<EOT | awk '{print NF}'
> I have a dog.
> She is small and cute,
> and likes to play with other dogs.
> EOT
4
5
7

NF is an awk variable which is always set to the number of fields in the current record.

edited Nov 21 '17 at 22:07

answered Nov 21 '17 at 17:50

Cole Tierney

9,571
1
27
35

score 0 · Answer 4 · answered Nov 21 '17 at 17:52

0

This very intuitive regex might help:

\b\w+\b

It matches all the word characters between word boundaries. You just need to count how many matches there are.

If you want to count words with hyphens (or any other characters) as 1 word, add - to the character set:

\b[\w\-]\b

or

\b[\w\-'.]\b

etc.

You get the idea.

answered Nov 21 '17 at 17:52

Sweeper

213,210
22
193
313

This pulls all the words in the file, but I need to count the words within a line. There is nothing to demarcate the end of the line in the output. – Zoey Nov 22 '17 at 02:29
@Zoey Refer to Roopak A Nelliat's answer if you don't know how to read the file line by line. – Sweeper Nov 22 '17 at 06:33

How to print the length of each line in a multiline text file using regex?

4 Answers4