350

How does one truncate a string to 75 characters in Python?

This is how it is done in JavaScript:

var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;
Adam Nelson
  • 7,932
  • 11
  • 44
  • 64
Hulk
  • 32,860
  • 62
  • 144
  • 215

22 Answers22

578
info = (data[:75] + '..') if len(data) > 75 else data
Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
185

Even more concise:

data = data[:75]

If it is less than 75 characters there will be no change.

neil
  • 3,387
  • 1
  • 14
  • 11
161

Even shorter :

info = data[:75] + (data[75:] and '..')
stanlekub
  • 1,882
  • 1
  • 10
  • 4
147

If you are using Python 3.4+, you can use textwrap.shorten from the standard library:

Collapse and truncate the given text to fit in the given width.

First the whitespace in text is collapsed (all whitespace is replaced by single spaces). If the result fits in the width, it is returned. Otherwise, enough words are dropped from the end so that the remaining words plus the placeholder fit within width:

>>> textwrap.shorten("Hello  world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello  world!", width=11)
'Hello [...]'
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
'Hello...'
Bora M. Alper
  • 3,538
  • 1
  • 24
  • 35
  • 20
    It seems to crap its pants on really long strings (no spaces) and outputs only the ellipsis. – datu-puti Jul 26 '17 at 19:46
  • 13
    @elBradford (and interested others): that's because `shorten()` truncates *words*, not single characters. I searched but there doesn't seem a way to configure `shorten()` or a `TextWrapper` instance to clip single characters and not words. – Acsor Sep 10 '17 at 14:51
  • 2
    And it has the annoying side effect of removing line breaks – havlock Dec 07 '17 at 17:05
  • 5
    This does not solve OP’s question. It truncates by word and even removes whitespace. – Florian Wendelborn Apr 24 '19 at 15:32
  • To hard-wrap (ignore whitespace): `def shorten(s, width, placeholder='[...]'): return s[:width] if len(s) <= width else s[:width-len(placeholder)] + placeholder` – Orwellophile Jan 03 '22 at 05:53
48

For a Django solution (which has not been mentioned in the question):

from django.utils.text import Truncator
value = Truncator(value).chars(75)

Have a look at Truncator's source code to appreciate the problem: https://github.com/django/django/blob/master/django/utils/text.py#L66

Concerning truncation with Django: Django HTML truncation

Community
  • 1
  • 1
Risadinha
  • 16,058
  • 2
  • 88
  • 91
15

With regex:

re.sub(r'^(.{75}).*$', '\g<1>...', data)

Long strings are truncated:

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

Shorter strings never get truncated:

>>> data="11111111112222222222333333"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'11111111112222222222333333'

This way, you can also "cut" the middle part of the string, which is nicer in some cases:

re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
'11111...88888'
Davide Guerri
  • 1,887
  • 2
  • 17
  • 25
  • well that didn't worked when you have spaces in your string – holms Oct 19 '15 at 01:01
  • 1
    Why would you use regex for such a simple case? – Bora M. Alper Aug 18 '16 at 12:14
  • It does work with spaces. e.g. for the last one the output is: '111111111 222222222 333333333 444444444 55555555556666666666777777777 88888...' – Davide Guerri Mar 10 '21 at 10:23
  • `.*$` should be replaced with `.+$`, so that it only matches longer strings. Now a string that is exactly 75 characters long will get ellipsis without actually truncating anything. – Frax Jul 06 '23 at 11:48
14
limit = 75
info = data[:limit] + '..' * (len(data) > limit)
HelloGoodbye
  • 3,624
  • 8
  • 42
  • 57
  • 1
    This is the most elegant solution. Additionally I would extract the chars limit (in this case `75`) into a variable to avoid inconsistencies. `limit = 75; info = data[:limit] + '..' * (len(data) > limit)` – ekauffmann Oct 25 '18 at 13:41
6

This method doesn't use any if:

data[:75] + bool(data[75:]) * '..'

Sassan
  • 2,187
  • 2
  • 24
  • 43
  • 4
    I wrote it only to show that it's possible. It's against python's readability philosophy. It doesn't have any performance advantage comparing with other "if" based methods. I never use it and I don't suggest you use it too. – Sassan Aug 23 '16 at 09:19
6

This just in:

n = 8
s = '123'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '12345678'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789'     
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789012345'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]

123
12345678
12345...
12345...
dansalmo
  • 11,506
  • 5
  • 58
  • 53
  • 2
    All of the previous answers neglect to consider what the OP really wanted - an output string no longer than 75 characters. Kudos for understanding the "don't do what I say, do what I want" programming principle. For completeness you could fix the corner case of n<3 by appending: if n > 2 else s[:n] – Dave Apr 15 '20 at 16:35
5
info = data[:75] + ('..' if len(data) > 75 else '')
HelloGoodbye
  • 3,624
  • 8
  • 42
  • 57
5
info = data[:min(len(data), 75)
Stephen Kennedy
  • 20,585
  • 22
  • 95
  • 108
Jesse
  • 77
  • 1
  • 1
4

You can't actually "truncate" a Python string like you can do a dynamically allocated C string. Strings in Python are immutable. What you can do is slice a string as described in other answers, yielding a new string containing only the characters defined by the slice offsets and step. In some (non-practical) cases this can be a little annoying, such as when you choose Python as your interview language and the interviewer asks you to remove duplicate characters from a string in-place. Doh.

Dave
  • 764
  • 7
  • 17
  • The question resulted from the programming language JavaScript and not C. There the strings are also immutable. – colidyre Feb 18 '21 at 09:14
  • 1
    The question was "How does one truncate a string to 75 characters in Python?". The answer is "You can't". That the OP thinks Javascript substring == truncate is irrelevant. Further, the point of my answer is that the Pythonic idiom that is used is string "slicing" in instances where in - for example C - you might truncate a string. It saves allocation and duplication by just using a couple of pointers into the existing string. – Dave Feb 18 '21 at 15:00
  • 2
    I think it is clear that the OP doesn't mean to truncate the original string. He/She wants obviously the same behavior as in JavaScript. But nevertheless, your answer is very correct and could help others to understand that also in Python strings are immutable and you're not transforming the original string. (+1) – colidyre Feb 18 '21 at 15:09
3

Yet another solution. With True and False you get a little feedback about the test at the end.

data = {True: data[:75] + '..', False: data}[len(data) > 75]
David Maust
  • 8,080
  • 3
  • 32
  • 36
3

Coming very late to the party I want to add my solution to trim text at character level that also handles whitespaces properly.

def trim_string(s: str, limit: int, ellipsis='…') -> str:
    s = s.strip()
    if len(s) > limit:
        return s[:limit-1].strip() + ellipsis
    return s

Simple, but it will make sure you that hello world with limit=6 will not result in an ugly hello … but hello… instead.

It also removes leading and trailing whitespaces, but not spaces inside. If you also want to remove spaces inside, checkout this stackoverflow post

NicoHood
  • 687
  • 5
  • 12
1
       >>> info = lambda data: len(data)>10 and data[:10]+'...' or data
       >>> info('sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdf')
           'sdfsdfsdfs...'
       >>> info('sdfsdf')
           'sdfsdf'
       >>> 
Spouk
  • 691
  • 7
  • 18
1

Simple and short helper function:

def truncate_string(value, max_length=255, suffix='...'):
    string_value = str(value)
    string_truncated = string_value[:min(len(string_value), (max_length - len(suffix)))]
    suffix = (suffix if len(string_value) > max_length else '')
    return string_truncated+suffix

Usage examples:

# Example 1 (default):

long_string = ""
for number in range(1, 1000): 
    long_string += str(number) + ','    

result = truncate_string(long_string)
print(result)


# Example 2 (custom length):

short_string = 'Hello world'
result = truncate_string(short_string, 8)
print(result) # > Hello... 


# Example 3 (not truncated):

short_string = 'Hello world'
result = truncate_string(short_string)
print(result) # > Hello world

lacroixDj
  • 91
  • 1
  • 4
1

Here I use textwrap.shorten and handle more edge cases. also include part of the last word in case this word is more than 50% of the max width.

import textwrap


def shorten(text: str, width=30, placeholder="..."):
    """Collapse and truncate the given text to fit in the given width.

    The text first has its whitespace collapsed. If it then fits in the *width*, it is returned as is.
    Otherwise, as many words as possible are joined and then the placeholder is appended.
    """
    if not text or not isinstance(text, str):
        return str(text)
    t = text.strip()
    if len(t) <= width:
        return t

    # textwrap.shorten also throws ValueError if placeholder too large for max width
    shorten_words = textwrap.shorten(t, width=width, placeholder=placeholder)

    # textwrap.shorten doesn't split words, so if the text contains a long word without spaces, the result may be too short without this word.
    # Here we use a different way to include the start of this word in case shorten_words is less than 50% of `width`
    if len(shorten_words) - len(placeholder) < (width - len(placeholder)) * 0.5:
        return t[:width - len(placeholder)].strip() + placeholder
    return shorten_words

Tests:

>>> shorten("123 456", width=7, placeholder="...")
'123 456'
>>> shorten("1 23 45 678 9", width=12, placeholder="...")
'1 23 45...'
>>> shorten("1 23 45 678 9", width=10, placeholder="...")
'1 23 45...'
>>> shorten("01 23456789", width=10, placeholder="...")
'01 2345...'
>>> shorten("012 3 45678901234567", width=17, placeholder="...")
'012 3 45678901...'
>>> shorten("1 23 45 678 9", width=9, placeholder="...")
'1 23...'
>>> shorten("1 23456", width=5, placeholder="...")
'1...'
>>> shorten("123 456", width=5, placeholder="...")
'12...'
>>> shorten("123 456", width=6, placeholder="...")
'123...'
>>> shorten("12 3456789", width=9, placeholder="...")
'12 345...'
>>> shorten("   12 3456789    ", width=9, placeholder="...")
'12 345...'
>>> shorten('123 45', width=4, placeholder="...")
'1...'
>>> shorten('123 45', width=3, placeholder="...")
'...'
>>> shorten("123456", width=3, placeholder="...")
'...'
>>> shorten([1], width=9, placeholder="...")
'[1]'
>>> shorten(None, width=5, placeholder="...")
'None'
>>> shorten("", width=9, placeholder="...")
''
Noam Nol
  • 570
  • 4
  • 11
1

If you wish to do some more sophisticated string truncate you can adopt sklearn approach as implement by:

sklearn.base.BaseEstimator.__repr__ (See Original full code at: https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/base.py#L262)

It adds benefits such as avoiding truncate in the middle of the word.

def truncate_string(data, N_CHAR_MAX=70):
    # N_CHAR_MAX is the (approximate) maximum number of non-blank
    # characters to render. We pass it as an optional parameter to ease
    # the tests.

    lim = N_CHAR_MAX // 2  # apprx number of chars to keep on both ends
    regex = r"^(\s*\S){%d}" % lim
    # The regex '^(\s*\S){%d}' % n
    # matches from the start of the string until the nth non-blank
    # character:
    # - ^ matches the start of string
    # - (pattern){n} matches n repetitions of pattern
    # - \s*\S matches a non-blank char following zero or more blanks
    left_lim = re.match(regex, data).end()
    right_lim = re.match(regex, data[::-1]).end()
    if "\n" in data[left_lim:-right_lim]:
        # The left side and right side aren't on the same line.
        # To avoid weird cuts, e.g.:
        # categoric...ore',
        # we need to start the right side with an appropriate newline
        # character so that it renders properly as:
        # categoric...
        # handle_unknown='ignore',
        # so we add [^\n]*\n which matches until the next \n
        regex += r"[^\n]*\n"
        right_lim = re.match(regex, data[::-1]).end()
    ellipsis = "..."
    if left_lim + len(ellipsis) < len(data) - right_lim:
        # Only add ellipsis if it results in a shorter repr
        data = data[:left_lim] + "..." + data[-right_lim:]
    return data
Hagai Drory
  • 141
  • 1
  • 5
0

There's no need for a regular expression but you do want to use string formatting rather than the string concatenation in the accepted answer.

This is probably the most canonical, Pythonic way to truncate the string data at 75 characters.

>>> data = "saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
>>> info = "{}..".format(data[:75]) if len(data) > 75 else data
>>> info
'111111111122222222223333333333444444444455555555556666666666777777777788888...'
Adam Nelson
  • 7,932
  • 11
  • 44
  • 64
  • I found it funny how your `saddddddd...` string turns into `111111...`:) I know it's a copy-paste typo though, and I agree with you about regular expressions. – akarilimano Jan 29 '18 at 08:46
0

Here's a function I made as part of a new String class... It allows adding a suffix ( if the string is size after trimming and adding it is long enough - although you don't need to force the absolute size )

I was in the process of changing a few things around so there are some useless logic costs ( if _truncate ... for instance ) where it is no longer necessary and there is a return at the top...

But, it is still a good function for truncating data...

##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
##      _data = String.Truncate( _text, _width )                        == Test
##      _data = String.Truncate( _text, _width, '..', True )            == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
##      _data = String.SubStr( _text, 0, _width )                       == Test
##      _data = _text[  : _width ]                                      == Test
##      _data = ( _text )[  : _width ]                                  == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
    ## Length of the string we are considering for truncation
    _len            = len( _text )

    ## Whether or not we have to truncate
    _truncate       = ( False, True )[ _len > _max_len ]

    ## Note: If we don't need to truncate, there's no point in proceeding...
    if ( not _truncate ):
        return _text

    ## The suffix in string form
    _suffix_str     = ( '',  str( _suffix ) )[ _truncate and _suffix != False ]

    ## The suffix length
    _len_suffix     = len( _suffix_str )

    ## Whether or not we add the suffix
    _add_suffix     = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]

    ## Suffix Offset
    _suffix_offset = _max_len - _len_suffix
    _suffix_offset  = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]

    ## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
    ## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
    _len_truncate   = ( _len, _max_len )[ _truncate ]
    _len_truncate   = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]

    ## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
    if ( _add_suffix ):
        _text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]

    ## Return the text after truncating...
    return _text[ : _len_truncate ]
Acecool
  • 682
  • 9
  • 12
0

Suppose that stryng is a string which we wish to truncate and that nchars is the number of characters desired in the output string.

stryng = "sadddddddddddddddddddddddddddddddddddddddddddddddddd"
nchars = 10

We can truncate the string as follows:

def truncate(stryng:str, nchars:int):
    return (stryng[:nchars - 6] + " [...]")[:min(len(stryng), nchars)]

The results for certain test cases are shown below:

s = "sadddddddddddddddddddddddddddddd!"
s = "sa" + 30*"d" + "!"

truncate(s, 2)                ==  sa
truncate(s, 4)                ==  sadd
truncate(s, 10)               ==  sadd [...]
truncate(s, len(s)//2)        ==  sadddddddd [...]

My solution produces reasonable results for the test cases above.

However, some pathological cases are shown below:

Some Pathological Cases!

truncate(s, len(s) - 3)()       ==  sadddddddddddddddddddddd [...]
truncate(s, len(s) - 2)()       ==  saddddddddddddddddddddddd [...]
truncate(s, len(s) - 1)()       ==  sadddddddddddddddddddddddd [...]
truncate(s, len(s) + 0)()       ==  saddddddddddddddddddddddddd [...]
truncate(s, len(s) + 1)()       ==  sadddddddddddddddddddddddddd [...
truncate(s, len(s) + 2)()       ==  saddddddddddddddddddddddddddd [..
truncate(s, len(s) + 3)()       ==  sadddddddddddddddddddddddddddd [.
truncate(s, len(s) + 4)()       ==  saddddddddddddddddddddddddddddd [
truncate(s, len(s) + 5)()       ==  sadddddddddddddddddddddddddddddd 
truncate(s, len(s) + 6)()       ==  sadddddddddddddddddddddddddddddd!
truncate(s, len(s) + 7)()       ==  sadddddddddddddddddddddddddddddd!
truncate(s, 9999)()             ==  sadddddddddddddddddddddddddddddd!

Notably,

  • When the string contains new-line characters (\n) there could be an issue.
  • When nchars > len(s) we should print string s without trying to print the "[...]"

Below is some more code:

import io

class truncate:
    """
        Example of Code Which Uses truncate:
        ```
            s = "\r<class\n 'builtin_function_or_method'>"
            s = truncate(s, 10)()
            print(s)
                    ```
                Examples of Inputs and Outputs:
                        truncate(s, 2)()   ==  \r
                        truncate(s, 4)()   ==  \r<c
                        truncate(s, 10)()  ==  \r<c [...]
                        truncate(s, 20)()  ==  \r<class\n 'bu [...]
                        truncate(s, 999)() ==  \r<class\n 'builtin_function_or_method'>
                    ```
                Other Notes:
                    Returns a modified copy of string input
                    Does not modify the original string
            """
    def __init__(self, x_stryng: str, x_nchars: int) -> str:
        """
        This initializer mostly exists to sanitize function inputs
        """
        try:
            stryng = repr("".join(str(ch) for ch in x_stryng))[1:-1]
            nchars = int(str(x_nchars))
        except BaseException as exc:
            invalid_stryng =  str(x_stryng)
            invalid_stryng_truncated = repr(type(self)(invalid_stryng, 20)())

            invalid_x_nchars = str(x_nchars)
            invalid_x_nchars_truncated = repr(type(self)(invalid_x_nchars, 20)())

            strm = io.StringIO()
            print("Invalid Function Inputs", file=strm)
            print(type(self).__name__, "(",
                  invalid_stryng_truncated,
                  ", ",
                  invalid_x_nchars_truncated, ")", sep="", file=strm)
            msg = strm.getvalue()

            raise ValueError(msg) from None

        self._stryng = stryng
        self._nchars = nchars

    def __call__(self) -> str:
        stryng = self._stryng
        nchars = self._nchars
        return (stryng[:nchars - 6] + " [...]")[:min(len(stryng), nchars)]
Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42
0

Here's a simple function that will truncate a given string from either side:

def truncate(string, length=75, beginning=True, insert='..'):
    '''Shorten the given string to the given length.
    An ellipsis will be added to the section trimmed.

    :Parameters:
        length (int) = The maximum allowed length before trunicating.
        beginning (bool) = Trim starting chars, else; ending.
        insert (str) = Chars to add at the trimmed area. (default: ellipsis)

    :Return:
        (str)

    ex. call: truncate('12345678', 4)
        returns: '..5678'
    '''
    if len(string)>length:
        if beginning: #trim starting chars.
            string = insert+string[-length:]
        else: #trim ending chars.
            string = string[:length]+insert
    return string
m3trik
  • 333
  • 2
  • 13