Split a string and save them to a list with Python

Question

I have a string that I inserted a space into it in all different positions and saved them to a list. Now this list of strings with space in them, I want to split those strings and put the output in one list, when am doing this, it happens that am having multiple list inside:

This is the code am working on:

var ='sans'
res = [var[:i]+' '+var[i:] for i in range(len(var))]
// The previous line: AM adding a space to see maybe that would generate other words
cor = [res[i].split() for i in range (len(res))]

And this is the output am getting:

>>> cor
[['sans'], ['s', 'ans'], ['sa', 'ns'], ['san', 's']]

What am expecting:

>>> cor
    ['sans', 's', 'ans', 'sa', 'ns', 'san', 's']

Am new to python, I don't know what am missing.

Thanks

pault · Accepted Answer · 2017-12-30T01:07:20.930

6

An alternative approach:

cor = " ".join(res).split()

Output:

['sans', 's', 'ans', 'sa', 'ns', 'san', 's']

Explanation

" ".join(res) will join the individual strings in res with a space in between them. Then calling .split() will split this string on whitespace back into a list.

EDIT: A second approach that doesn't involve the intermediate variable res, although this one isn't quite as easy on the eyes:

cor = [var[:i/2+1] if i%2==1 else var[i/2:] for i in range(2*len(var)-1)]

Basically you flip between building substrings from the front and the back.

edited Dec 30 '17 at 01:07

answered Dec 30 '17 at 00:44

pault

41,343
15
107
149

@roganjosh Compared to *yours* maybe :-P – Stefan Pochmann Dec 30 '17 at 00:52
@StefanPochmann haha, oh come on, this is gonna have to come down to a `timeit` because this answer beats even a list comp IMO in simplicity :P – roganjosh Dec 30 '17 at 00:53
@StefanPochmann's answer is likely the best (he's got my upvote) but double list comprehensions can be difficult to understand for someone thats "new to python." – pault Dec 30 '17 at 00:56
@pault only `timeit` will tell :P – roganjosh Dec 30 '17 at 00:56
@roganjosh Well, I'm not saying it's not simpler or more elegant than mine, I just don't think it's "sickeningly" so :-). Also, not sure it's correct in all cases (when the input string has spaces, I think we might get different results). – Stefan Pochmann Dec 30 '17 at 00:57
(Not saying differing means mine's the correct one... And I btw upvoted this one, I like it a lot :-) – Stefan Pochmann Dec 30 '17 at 00:59
@StefanPochmann building a test bed for it, but I think the observation about the whitespace is perfectly valid too – roganjosh Dec 30 '17 at 01:00
Another way to do it without building the `res` variable is: `cor = [var[:i/2+1] if i%2==1 else var[i/2:] for i in range(2*len(var)-1)]` but that's not so elegant. – pault Dec 30 '17 at 01:04
@pault Tried direct as well: `[s for i in range(len(var)) for s in (var[:i], var[i:])][1:]`. But I don't like it. – Stefan Pochmann Dec 30 '17 at 01:12
@StefanPochmann ok, maybe not sickeningly elegant. This answer: `945 ns ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)`. Mine: `3.54 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)`. Yours: `2.16 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)`. Easy in syntax, but grossly inefficient. – roganjosh Dec 30 '17 at 01:13
@roganjosh Wait, what? Which one are you talking about when you say easy but inefficient? – Stefan Pochmann Dec 30 '17 at 01:16
1

@roganjosh But it's much *faster* than ours. – Stefan Pochmann Dec 30 '17 at 01:18
@StefanPochmann Oh, it's time for bed for me again, I shouldn't comment after midnight :( – roganjosh Dec 30 '17 at 01:20
@StefanPochmann I set [this](https://pastebin.com/6vtS7Ffp) up for testing. I clearly couldn't read the %timeit outputs properly so it would be prudent to check it since I got ms and ns confused. – roganjosh Dec 30 '17 at 01:28
1

@roganjosh Yeah you made a big error there. I'm not spelled with "ph" :-). Also, there's no timing code, it's just the solutions. – Stefan Pochmann Dec 30 '17 at 01:30
@StefanPochmann I used the magic %timeit in iPython. But yeah, I screwed up on the name too... I'm done for the night! – roganjosh Dec 30 '17 at 01:33
@roganjosh Yours btw suffers from using the OP's bad list comprehension. – Stefan Pochmann Dec 30 '17 at 01:34
@StefanPochmann for sure, which is why I gave you the code I used to test. Otherwise, I don't know a fair way to test since the other two answers live with that, and including it in your approach would be unfair in your timings, since the list comp in the function would slow your answer down. Try it as a full approach for all functions. – roganjosh Dec 30 '17 at 01:37
@pault I think your first approach wins in terms of efficiency, I ran some tests [here](https://ideone.com/zozhMt). – RoadRunner Dec 30 '17 at 03:03

score 3 · Answer 2 · answered Dec 30 '17 at 00:48

3

First of all, your

[res[i].split() for i in range (len(res))]

is a complicated unpythonic way to do the same as this:

[r.split() for r in res]

Now... the problem is that you treat r.split() as your end result. You should instead use it as a source to treat it further:

[s for r in res for s in r.split()]

answered Dec 30 '17 at 00:48

Stefan Pochmann

27,593
8
44
107

This approach seems to be quite fast also, as seen [here](https://ideone.com/zozhMt). – RoadRunner Dec 30 '17 at 03:02
1

@RoadRunner With `var ='sans' * 1000` and `n = 40` it looks better: https://ideone.com/6dC8Rb. But interestingly, Dekel's is even faster then. That didn't seem right, so I tested them locally independently (also with Python 3.5) and mine was faster there. Then I moved mine to the end and lo and behold, it became the fastest: https://ideone.com/yYXRrn. Then I moved *yours* to the end and *it* became almost the fastest: https://ideone.com/sN5Gnn. I guess I'll have to stop benchmarking on ideone. Seems like later tests benefit from earlier tests increasing the process priority or so. – Stefan Pochmann Dec 30 '17 at 03:30

Dekel · Answer 3 · 2017-12-30T00:45:23.850

2

If you have a list

cor = [['sans'], ['s', 'ans'], ['sa', 'ns'], ['san', 's']]

And you want to flatten it, you can use the following:

flat = [x for y in cor for x in y]

The output will be:

['sans', 's', 'ans', 'sa', 'ns', 'san', 's']

You can also make that directly with the res variable:

cor = [x for y in [res[i].split() for i in range (len(res))] for x in y]

edited Dec 30 '17 at 00:45

answered Dec 30 '17 at 00:43

Dekel

60,707
10
101
129

Would be better to not build that `cor` in the first place. – Stefan Pochmann Dec 30 '17 at 00:44
@StefanPochmann you are right, added an example for that as well – Dekel Dec 30 '17 at 00:45
Nah, that's not what I meant. You're still building that list (even in the exact same bad way). – Stefan Pochmann Dec 30 '17 at 00:46

RoadRunner · Answer 4 · 2017-12-30T02:09:27.757

1

You coud always use map() to split each string in res:

list(map(str.split, res))

Which gives:

[['sans'], ['s', 'ans'], ['sa', 'ns'], ['san', 's']]

Then you can use itertools.chain.from_iterable to flatten the list:

list(chain.from_iterable(map(str.split, res)))

Which Outputs:

['sans', 's', 'ans', 'sa', 'ns', 'san', 's']

edited Dec 30 '17 at 02:09

answered Dec 30 '17 at 02:01

RoadRunner

25,803
6
42
75

score 0 · Answer 5 · answered Jan 01 '18 at 03:24

You could do something like this in one line without importing any module :

var ='sans'

final=[]
list(map(lambda x:list(map(lambda y:final.append(y),x)),[(var[i:]+' '+var[:i]).split() for i in range(0,len(var))]))
print(final)

output:

['sans', 'ans', 's', 'ns', 'sa', 's', 'san']

Split a string and save them to a list with Python

5 Answers5