set a limit to the number of words or characters in a string

Question

say I have a list of string elements

wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot", "hello have a da"]

and I want each element in my list to have a maximum of say 3 words or 20 characters. Is there a function to do this?

score 9 · Accepted Answer · answered Aug 13 '15 at 01:39

9

Both can be done using list comprehension:

1) Max 20 characters:

new_list = [item[:20] for item in wordlist]
>>> new_list
["hi what's up home di", 'Oh wise master kakar', 'hello have a da']

2) Max 3 words:

new_list = [' '.join(item.split()[:3]) for item in wordlist if item]
>>> new_list
["hi what's up", 'Oh wise master', 'hello have a']

answered Aug 13 '15 at 01:39

Alexander

105,104
32
201
196

@AustinA, can you show me how you got those results? I think perhaps you're on Python 3, where map returns an iterable, not a list. My results show the list comprehension as being about twice as fast. EDIT: You deleted your comment before I posted this, so I guess you figured it out :). – Cyphase Aug 13 '15 at 03:11
@Cyphase, the first time I ran it, I got those results. I repeatedly ran it and got the same elapsed time for both solutions, hence why I deleted my comment. I'm on python 2.7 but I found that either solution, using UNIX's `time` command executed in 0.001s. I also found [here](http://stackoverflow.com/questions/1247486/python-list-comprehension-vs-map) that performance advantages aren't as cut-and-dry as I originally thought. I, however, I didn't observe results showing that list comprehension is twice as fast, only that they are equal in this situation. How did you test? – Austin A Aug 13 '15 at 03:14
1

@AustinA, I'll respond under your answer so we don't bother Alexander :). – Cyphase Aug 13 '15 at 03:20
1

Long story short, list comprehension is significantly faster than python's `map` in this situation. But who doesn't like lambda expressions?!! – Austin A Aug 13 '15 at 03:37
what is `[:n]` called so that i can look it up in the documentation? – oldboy Sep 29 '18 at 16:07
It is called slicing and is explained here: https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation – Alexander Oct 03 '18 at 03:49

Austin A · Answer 2 · 2015-08-13T03:04:38.587

3

Here's how you would do this using the builtin map function (with lambda expressions)

20 character limit

wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot"]
new_wordlist = map(lambda x: x[:20], wordlist)
>>> ["hi what's up home di", 'Oh wise master kakar']

3 word limit

wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot"]
new_wordlist = map(lambda x: ' '.join(x.split(' ')[:3]), wordlist)
>>> ["hi what's up", 'Oh wise master']

edited Aug 13 '15 at 03:04

answered Aug 13 '15 at 01:38

Austin A

2,990
6
27
42

In response to a comment on another answer: `WORDLIST_SETUP = """wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot", "hello have a da"]"""; timeit.timeit('[item[:20] for item in wordlist]', setup=WORDLIST_SETUP) == 0.22185492515563965; timeit.timeit('map(lambda item: item[:20], wordlist)', setup=WORDLIST_SETUP) == 0.5172247886657715`. – Cyphase Aug 13 '15 at 03:20
@Cyphase So I shouldn't have trusted the 0.001s results. That was due to an error which I didn't notice until now. When I run Alexander's solutions on my system (OSX, python 2.7, using UNIX `time`) i get runtimes of 0.022s real and 0.022s real, respectively. When I run my solutions, I get runtimes of 0.022s real, and 0.021s real, respectively. From my point of view, the two approaches seem equivalent from a performance perspective. – Austin A Aug 13 '15 at 03:23
I'm using the `timeit` module, not the `time` command. Try running it in an interpreter using the exact code I used. I'll do the same for `time`. – Cyphase Aug 13 '15 at 03:25
Ooh, doh. I know why. You're only running each version once, which is so fast it barely changes the run time, which is vastly dominated by the overhead of starting Python :). Whereas I'm running each version 10000 times (the default in `timeit`) with no overhead. The returned times are the total run time for those 10000 times. – Cyphase Aug 13 '15 at 03:27
I do see performance differences from using python's `timeit`. Alexander's solution takes 0.356 (time units) and mine takes 0.850. I really didn't think the map function was THAT much slower than list comprehension. I also was wondering if `timeit` used some iterative process to test code runtime, which it looks like it does from your previous comment. – Austin A Aug 13 '15 at 03:29
There's (practically) no overhead when using `timeit`; it's made to time things accurately. `time` is very good for timing the run time of an entire process, but that's not what we want here; we want to time a specific piece of Python code. You *could* make a script that runs the code you want to test 1000000 times, for instance, and that would lessen the effect of the process overhead. But for timing Python code, use `timeit`. – Cyphase Aug 13 '15 at 03:32
`timeit` is so accurate that `timeit.timeit('2', number=1)` often returns `0.0` :). Obviously it's not actually taking `0.0` seconds, but that's more an issue of floating point numbers than anything else. – Cyphase Aug 13 '15 at 03:35
I hear you. I also expanded my `time` test to using `wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot", "hello have a da"] * 1000000`. This showed a truer difference. Alexander's approach took 2.59s and my approach took 3.35s. Definitely a difference in favor of list comprehension there. – Austin A Aug 13 '15 at 03:35
what is `[:n]` called so that i can look it up in the documentation? – oldboy Sep 29 '18 at 16:07

score 1 · Answer 3 · answered Aug 13 '15 at 01:48

You can do it together with this code:

import sys

wordCount = int(sys.argv[1])
charCount = int(sys.argv[2])

wordlist = ["hi what's up home diddle mc doo", "Oh wise master kakarot", "hello have a da"]

print(wordlist)
for i in range(len(wordlist)):
    currItem = wordlist[i]
    splitItems = currItem.split(" ")
    length = sum(len(s) for s in splitItems[0:wordCount])
    index = wordCount
    while(length > charCount):
        index -= 1
        length = sum(len(s) for s in splitItems[0:index])
    pass
    wordlist[i] = ' '.join(splitItems[0:index])
pass    
print(wordlist)

Personally, I think you over complicated the question at hand. You took something doable in a single line of code and somehow turned it into 20 lines of code. — Austin A, Aug 13 '15 at 03:09

set a limit to the number of words or characters in a string

3 Answers3

Linked

Related