Joining side by side words from list in Python

Question

Have a bunch of lists that are converted from a .txt file that have been read as a collection of strings that look like:

['New', 'Jersey', '1', '0', '1', '999']
['West', 'North', 'Central', '1', '0', '100', '90']

These lists have differing numbers of side-by-side words (the first has 2 the second has 3, etc..)

I want to output a new list (then into a compiled dataframe) that joins the words together that are side-by-side like:

['New Jersey', '1', '0', '1', '999']
['West North Central', '1', '0', '100', '90']

Which will make the new list (and dataframe) of the same length.

It's easy to just append(line.split()) into a new list for each string but can't figure out the if-statement and .join() needed to join all words and append each number separately.

jpp · Accepted Answer · 2018-10-19T17:08:44.367

6

Using itertools.groupby, you can group by str.isalpha, join strings conditionally, and then chain the results:

from itertools import chain, groupby

L = ['New', 'Jersey', '1', '0', '1', '999']

grouper = groupby(L, key=str.isalpha)
joins = [[' '.join(v)] if alpha_flag else list(v) for alpha_flag, v in grouper]
res = list(chain.from_iterable(joins))

print(res)

['New Jersey', '1', '0', '1', '999']

edited Oct 19 '18 at 17:08

answered Oct 19 '18 at 17:07

jpp

159,742
34
281
339

1

@pault, Wow, and 4 seconds apart! – jpp Oct 19 '18 at 17:09
I had exactly the same idea too! that said, i'm never sure if these constructs with nesting and flattening are better than a regular for loop with append and extend – Chris_Rands Oct 19 '18 at 17:14
I had to tend to the kids while composing my answer so I was late, but yeah the basic idea is the same. – blhsing Oct 19 '18 at 17:17
2

@blhsing `'kids' > 'SO'` is `True` fortunately ;) – Chris_Rands Oct 19 '18 at 17:18
1

@blhsing (moved comment from your answer to here) - Though it won't make a huge deal in this particular example, using `list.__add__` is generally a [bad way of flattening nested lists](https://stackoverflow.com/a/41772165/5858851). – pault Oct 19 '18 at 17:19
@pault Since the OP has a very specific input format that consists of exactly two groups in every list I thought it'd be a waste to use `chain` and instead used a binary operator. But in general your point is well taken. – blhsing Oct 19 '18 at 17:22
1

@blhsing im surprised you and jpp went groupby on this one, maybe I fell out of the great minds connection :/ – vash_the_stampede Oct 19 '18 at 21:54

mVChr · Answer 2 · 2018-10-19T17:11:27.683

0

line = ['West', 'North', 'Central', '1', '0', '100', '90']
words = []
nums = []

for word in line:
    if word.isalpha():
        words.append(word)
    else:
        nums.append(word)

new_line = [' '.join(words)]
new_line.extend(nums)

# new_line == ['West North Central', '1', '0', '100', '90']

edited Oct 19 '18 at 17:11

answered Oct 19 '18 at 17:04

mVChr

49,587
11
107
104

1

`if word.isalpha():` could replace `all(...)`, also could append/extend to `new_line` in the loop – Chris_Rands Oct 19 '18 at 17:06

score 0 · Answer 3 · answered Oct 19 '18 at 17:13

You can write you own function to make concatenation, for example:

l = [
    ['New', 'Jersey', '1', '0', '1', '999'],
    ['West', 'North', 'Central', '1', '0', '100', '90']]

def my_concat(l):
    nl = []
    cur = None
    delim = ""
    for i in l:
        if isinstance(i, (str, unicode)) and i.isalpha():
            if cur == None:
                cur = ""
            cur += delim + i
            delim = " "
        else:
            if cur != None:
                nl.append(cur)
                cur = None
                delim = ""
            nl.append(i)
    return nl

for i in l:
    print my_concat(i)

output:

['New Jersey', '1', '0', '1', '999']
['West North Central', '1', '0', '100', '90']

score 0 · Answer 4 · answered Oct 19 '18 at 17:14

You can use itertools.groupby:

from itertools import groupby
l = [
    ['New', 'Jersey', '1', '0', '1', '999'],
    ['West', 'North', 'Central', '1', '0', '100', '90']
]
print([list.__add__(*(list(g) if k else [' '.join(g)] for k, g in groupby(s, key=str.isdigit))) for s in l])

This outputs:

[['New Jersey', '1', '0', '1', '999'], ['West North Central', '1', '0', '100', '90']]

maddy · Answer 5 · 2018-10-20T03:42:15.247

0

I am basically looping through the strings in list1. If it happens to be a word I append it to list2, or else it's appended to list3. The method isdigit() returns true if the string consists of only digits. And finally append all the contents of list2 to answer as a single string using 'join', and use extend to add all elements of list3 to the end of answer[].

list1=['West North Central', '1', '0', '100', '90']
list2=[]
list3=[]
for i in list1:
    if i.isdigit():
        list3.append(i)
    else:
        list2.append(i)
answer = []
answer.append(' '.join(list2))
answer.extend(list3)

edited Oct 20 '18 at 03:42

answered Oct 19 '18 at 17:23

maddy

114
7

could you please add more explain for your code .Thanks. – hollopost Oct 19 '18 at 17:30
not sure which part of the code is unclear, could you point out for which part do you need an explanation – maddy Oct 19 '18 at 17:54
i know it's clear for people professional but there is first step learn person need more explain for every line just hint about reason of why you write this line of code.Thanks . again.welcome in world of stack overflow for explain code. – hollopost Oct 19 '18 at 17:59
@hollopost I have edited the answer, let me know if it's clear now. Thanks – maddy Oct 19 '18 at 18:17
thanks for your quick respond .i wish i'm not annoying person .but this is roles of stack overflow explain for clarity . this is clear now thanks. – hollopost Oct 19 '18 at 18:22
happy to help. I'll keep it in mind to always explain the code. – maddy Oct 19 '18 at 18:49
Why did you remove the `isdigit` check? – jpp Oct 20 '18 at 00:41

score 0 · Answer 6 · answered Oct 19 '18 at 17:57

I suggest the following steps:

1) Find the indeces for word entrees 2) If you have two or more consecutive indeces that are not numeric, append them

Case:

import re

numeric_regex = re.compile('[0-9]+?') #Regex to find numeric indices 
test = ['New', 'Jersey', '1', '0', '1', '999', 'West', 'North', 'Central', '1', '0']

#Comprehension to find word indices 
word_indices = [idx for idx, x in enumerate(test) if numeric_regex.match(x) is None]

#Comprehension to find indices to merge on
merge_on = [idx for idx, x in enumerate(word_indices) if word_indices[idx-1] == x-1]

At this point I'm stumped on a way to do this without a for loop, so I'll just use a for loop:

reversed_merge_on = reversed(merge_on)
for x in reversed_merge_on:
    test[word_indices[x]-1] = ' '.join(test[word_indices[x]-1:word_indices[x]+1])
    del test[word_indices[x]]

This will get you through any given list. You can put it into a function and apply it to many lists. The code above will work as is, so you can copy to Python (I'm using 2.7) to see for yourself.

vash_the_stampede · Answer 7 · 2018-10-19T22:00:12.183

0

Use list comprehension and join non-digit items in one index, and then unpack a list comprehension for the digits.

lst = ['West', 'North', 'Central', '1', '0', '100', '90']
res = [' '.join([i for i in lst if not i.isdigit()]),*[i for i in lst if i.isdigit()]]
print(res)
# ['West North Central', '1', '0', '100', '90']

edited Oct 19 '18 at 22:00

answered Oct 19 '18 at 21:52

vash_the_stampede

4,590
1
8
20

what if the list is like `lst = ['1', '0', '100', '90', 'West', 'North', 'Central']` – Khalil Al Hooti Oct 19 '18 at 23:17
@KhalilAlHooti then it still works, did you attempt to plug it in? – vash_the_stampede Oct 19 '18 at 23:27
it returns `['West North Central', '1', '0', '100', '90']`, instead of `['1', '0', '100', '90', 'West North Central']` – Khalil Al Hooti Oct 19 '18 at 23:30
@KhalilAlHooti isnt that the desired output, why would we produce the opposite of the desired output, all demonstrated outputs are organized word first, with no mention that it should be otherwise – vash_the_stampede Oct 19 '18 at 23:37
I think the new list should preserve the order of the original list. OP does not mention that words must be first – Khalil Al Hooti Oct 19 '18 at 23:39
1

@KhalilAlHooti I respect and understand what you are saying, but I have to tailor my response according to what OP provides and only thing provided was outputs that are in this fashion, assuming for a reason as well – vash_the_stampede Oct 19 '18 at 23:43

Joining side by side words from list in Python

7 Answers7