5

I know that the pythonic way of concatenating a list of strings is to use

l =["a", "b", "c"]
"".join(l)

But how would I do this if I have a list of objects which contain a string (as an attribute), without reassigning the string?

I guess I could implement __str__(self). But that's a workaround that I would prefer not to use.

TrebledJ
  • 8,713
  • 7
  • 26
  • 48
Mac C.
  • 153
  • 10

8 Answers8

10

I guess the most pythonic way to do this would be using generator expression / list comprehension. If the string for example is an attribute of the object obj_instance.str_attr then just run:

"".join(x.str_attr for x in l)

or

"".join([x.str_attr for x in l])

edited: see discussion on the performance below (they claim that list comprehension - 2nd option is faster).

Dimgold
  • 2,748
  • 5
  • 26
  • 49
4

What about something like :

joined = "".join([object.string for object in lst_object])
Pablo
  • 217
  • 1
  • 10
  • 1
    You can leave out the square brackets so that you get a generator that yields each string as join needs it instead it having to create the whole list first. – pat Jun 15 '17 at 14:11
  • @pat a list comprehension is faster than a generator (at least when used in `str.join`) – vaultah Jun 15 '17 at 14:20
  • @vaultah interesting, I've never timed it – pat Jun 15 '17 at 14:26
  • @pat Specifically for `join`, it needs to create the whole list first to know how long all the strings are so that it can properly allocate the new string in memory. For this reason, in this specific case, a lost comprehension is the more optimal approach. In nearly every other situation the generator is more optimal. – SethMMorton Jun 15 '17 at 14:30
  • 2
    @pat [here's the relevant answer](https://stackoverflow.com/a/9061024/2301450). I think the same holds for `sorted`. – vaultah Jun 15 '17 at 14:30
4

The performance difference between generator expression and list comprehension is easy to measure:

python --version && python -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join(obj.a for obj in l)"
Python 2.7.12
10 loops, best of 3: 87.2 msec per loop

python --version && python -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join([obj.a for obj in l])"
Python 2.7.12
10 loops, best of 3: 77.1 msec per loop

python3.4 --version && python3.4 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join(obj.a for obj in l)"
Python 3.4.5
10 loops, best of 3: 77.4 msec per loop

python3.4 --version && python3.4 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join([obj.a for obj in l])"
Python 3.4.5
10 loops, best of 3: 66 msec per loop

python3.5 --version && python3.5 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join(obj.a for obj in l)"
Python 3.5.2
10 loops, best of 3: 82.8 msec per loop

python3.5 --version && python3.5 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join([obj.a for obj in l])"
Python 3.5.2
10 loops, best of 3: 64.9 msec per loop

python3.6 --version && python3.6 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join(obj.a for obj in l)"
Python 3.6.0
10 loops, best of 3: 84.6 msec per loop

python3.6 --version && python3.6 -m timeit -s \
  "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \
   "''.join([obj.a for obj in l])"
Python 3.6.0
10 loops, best of 3: 64.7 msec per loop

It turns out that list comprehension is consistently faster than generator expression:

  • 2.7: ~12% faster
  • 3.4: ~15% faster
  • 3.5: ~22% faster
  • 3.6: ~24% faster

But note that memory consumption for list comprehension is 2x.

Update

Dockerfile you can run on your hardware to get your results, like docker build -t test-so . && docker run --rm test-so.

FROM saaj/snake-tank

RUN echo '[tox] \n\
envlist = py27,py33,py34,py35,py36 \n\
skipsdist = True \n\
[testenv] \n\
commands = \n\
  python --version \n\
  python -m timeit -s \\\n\
    "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \\\n\
    "str().join(obj.a for obj in l)" \n\
  python -m timeit -s \\\n\
    "import argparse; l = [argparse.Namespace(a=str(i)) for i in range(1000000)]" \\\n\
    "str().join([obj.a for obj in l])"' > tox.ini
CMD tox
saaj
  • 23,253
  • 3
  • 104
  • 105
  • 1
    A pity that the generator expression became slower in 3.5 – bli Jun 29 '17 at 12:47
  • @bli Well, in absolute numbers generator expressions run in the same time across all these versions. It just seems list comprehensions were optimised in 3.5. – saaj Jun 29 '17 at 13:00
  • Don't your measures show that generator expressions are slower in 3.5 (82.8 msec per loop) and 3.6 (84.6 msec per loop) than in 3.4 (77.4 msec per loop)? – bli Jun 29 '17 at 13:13
  • @bli That's also true. But this is a speculation so I added `Dockerfile` and you can see the result on your hardware. Results can vary. – saaj Jun 29 '17 at 14:45
2

You can convert all your string attributes to list of strings:

string_list = [myobj.str for myobj in l]

The code above creates list of strings using generator. Afterwards u would use a standard way to concatenate strings:

"".join(string_list)

I3orn2FLY
  • 46
  • 6
1

list comprehension may be helpful. for example, with a list of dictionaries,

# data
data = [
  {'str': 'a', 'num': 1},
  {'str': 'b', 'num': 2},
]
joined_string = ''.join([item['str'] for item in data])
Leonard2
  • 894
  • 8
  • 21
1

From previous answers :

"".join([x.str_attr if hasattr(x,'str_attr_') else x for x in l ])

If your data type are simple.

''.join([somefunction(x) for x in l]) #

Have a look at the itertools module too. Then you could check filtering on values.

cgte
  • 440
  • 4
  • 10
1

Another possibility is to use functional programming:

class StrObj:
    def __init__(self, str):
        self.str = str

a = StrObj('a')
b = StrObj('b')
c = StrObj('c')

l = [a,b,c]

"".join(map(lambda x: x.str, l))

This will work with any way the string might be connected to the object (direktly as an attribute or in a more complicated way). Only the lambda has to be adapted.

Johannes
  • 3,300
  • 2
  • 20
  • 35
1

A self-explaining one-liner

"".join(str(d.attr) for d in l)
nehem
  • 12,775
  • 6
  • 58
  • 84