1

I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:

String = "is2 Thi1s T4est 3a"
LP = String.split() 
for e in LP:
    for i in e:
        if i in ('123456789'):
            result += i

This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list? post I understood that the right code shall be:

[i for e in LP for i in e if i in ('123456789') ]

My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.

My original incorrect code, which reversed the order:

[i for i in e for e in LP if i in ('123456789') ]

The result I get from that is:

['3', '3', '3', '3']

Could anyone explain the process that leads to this result please?

martineau
  • 119,623
  • 25
  • 170
  • 301
Bowen Liu
  • 1,065
  • 1
  • 11
  • 24
  • 3
    This is not an answer to your list comprehension question, but this problem can be solved much easier with `[c for c in String if c.isdigit()]`. No need to split the string. – iz_ Nov 26 '18 at 22:07
  • Wow, it really is much better than my approach. What a serendipity for me. Thanks. An additional question based on this: how would you proceed to reorder the words using the list that we got as index? Thanks. – Bowen Liu Nov 26 '18 at 22:11
  • @BowenLiu: that's a new question really. See [Sorting list based on values from another list?](//stackoverflow.com/q/6618515) for the general approach. – Martijn Pieters Nov 27 '18 at 11:31
  • @BowenLiu: and also see [Does Python have a built in function for string natural sort?](//stackoverflow.com/q/4836710), which is basically the same problem, sort a list of strings according to the embedded numbers. – Martijn Pieters Nov 27 '18 at 11:32

2 Answers2

2

Just reverse the same process you found in the other post. Nest the loops in the same order:

for i in e:
    for e in LP:
        if i in ('123456789'):
            print(i)

The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.

If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.

However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.

I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:

digits = [char for char in String if char.isdigit()]

or a even a regular expression:

digits = re.findall(r'\d', String)

and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:

# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.

# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']

# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thank you so much for your detailed explanation and answer. Every word of your comment is spot-on. I just got home and tried to recreate the situation and get the `NameError: name 'e' is not defined`, which is consistent with what you said. Don't know why the output shows as `['3', '3', '3', '3']` when I did this at work, while the only place where I defined e is in list comprehension. But I will check my code tomorrow. I assigned `e` a value of `3a` and then I got the expected `['3', '3', '3', '3']` result. Now this whole issue is clear to me. Thanks a lot. – Bowen Liu Nov 27 '18 at 04:27
  • About your comment on my initial approach, I did it because I didn't think of a better approach to extract out the number in each word. Your method is much simpler. I totally forgot about `isdigit`. Regex is on my learning list but I have yet to find a chance to actually study it. It looks so different from everything else but seems very efficient. – Bowen Liu Nov 27 '18 at 04:38
  • Thank you so much for your edit Martjin. I haven't learned much about regex so I don't quite understand your code. Could you tell me the difference between `r'\d+'` and `r'\d'` please? It seems to me both are looking for numbers. Google isn't very useful when I search symbols. And how would you go with it if you don't use regex? Thanks again – Bowen Liu Nov 27 '18 at 21:31
  • @BowenLiu: the link in my answer is to the Python regex howto; `+` extends the `\d` match. `\d` matches a single digit, `\d+` matches a sequence of digits, starting with at least one digit. So in `"abc123def"` you'd either match `"1"` or `"123"`. – Martijn Pieters Nov 27 '18 at 22:54
1

From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):

We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)

import re

test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()

words.sort(key=lambda s: int(re.search(r'\d+', s).group()))

print(words) # ['Thi1s', 'is2', '3a', 'T4est']

To remove the numbers:

words = [re.sub(r'\d', '', w) for w in words]

Final output is:

['This', 'is', 'a', 'Test']
iz_
  • 15,923
  • 3
  • 25
  • 40
  • Thanks a lot. Amazing that you know that I will eventually want to get rid of the numbers. I do but I really don't know anything about regex yet. So I will play around to find a way to use key to sort the input string. – Bowen Liu Nov 27 '18 at 04:44
  • Is there anyway to achieve it without using regex? I tried using list comprehension in the lambda but didn't get it work. Thanks. – Bowen Liu Nov 27 '18 at 04:52
  • 1
    Replace the lambda with `lambda s: int(''.join(filter(str.isdigit, s)))`. Accomplishes the same thing. – iz_ Nov 27 '18 at 04:54
  • Amazing. That's right, `filter`! Totally forgot about this function too. I need to practice more. Thanks a lot. – Bowen Liu Nov 27 '18 at 04:55
  • Hi Tomothy. I have been studying about sorted key function for the last 2 hours and I think I finally figured out how it works, not sure though. So in `lambda s: int(''.join(filter(str.isdigit, s)))`, s will be the input for the lambda function and it is each element of the list. So what this lambda function does is to extract out the number in each element and then make the list sort using the numbers of the element as key. Am I getting this right? – Bowen Liu Nov 27 '18 at 21:27
  • I was trying to use list comprehension as a key function. But I guess it has to be a function right? Or is there a way to use list comprehension as key for sort or sorted function? Thanks. – Bowen Liu Nov 27 '18 at 21:28
  • Yes, your understanding of the `lambda` is more or less correct. I don't see why you would want `key` as a list comprehension, but yes, it has to be a function. – iz_ Nov 27 '18 at 21:31
  • Thanks. I didn't really want it to be. I just understood it a little better than lambda. But now I learned about new ways to do it. Thanks. – Bowen Liu Nov 27 '18 at 21:33