scrape data and sort it using Python 2.7 and selenium

Question

i'm trying to scrape data in a website using selenium and python 2.7. Here is the code from the data that i want to scrape

<textarea>let, either, and, have, rather, because, your, with, other, that, neither, since, however, its, will, some, own, than, should, wants, they, got, may, what, least, else, cannot, like, whom, which, who, why, his, these, been, had, the, all, likely, their, must, our</textarea>

i need to insert all that words to list and sort it. for now this is my progres

wordlist = []
data = browser.find_element_by_tag_name("textarea")
words = data.get_attribute()
wordlist.append(words)
print words
print wordlist.sort()

any help or clue would be useful for me

Can you be more specific in your question? For example, is there an unexpected error occurring? — t_warsop, Dec 19 '18 at 10:13
i'm sorry if my question is not clear enough. when i try run that code print the words it's working fine but when i try print the wordlist after being sort, the output is none — Bzz21, Dec 19 '18 at 11:01

Andersson · Accepted Answer · 2018-12-19T11:16:54.583

1

Note that wordlist.sort() doesn't return list, but just sorts existed list, so you might need to do

wordlist.sort()
print wordlist

or try below code to get required output

data = driver.find_element_by_tag_name("textarea")
words = data.get_attribute('value')
sorted_list = sorted(words.split(', '))
print sorted_list
# ['all,', 'and,', 'because,', 'been,', 'cannot,', 'either,', 'else,', 'got,', 'had,', 'have,', 'his,', 'however,', 'its,', 'least,', 'let,', 'like,', 'likely,', 'may,', 'must,', 'neither,', 'other,', 'our', 'own,', 'rather,', 'should,', 'since,', 'some,', 'than,', 'that,', 'the,', 'their,', 'these,', 'they,', 'wants,', 'what,', 'which,', 'who,', 'whom,', 'why,', 'will,', 'with,', 'your,']

edited Dec 19 '18 at 11:16

answered Dec 19 '18 at 11:11

Andersson

51,635
17
77
129

this one is working. mind telling me what does `sorted_list = sorted(words.split(', '))` do? but i've problem when i run my code. here's the output `[u'able', u'after', u'almost', u'any', u'because', u'been', u'could', u'did', u'does', u'else', u'ever', u'for', u'get', u'had', u'has', u'have', u'her', u'hers', u'his', u'however', u'into', u'its', u'let', u'like', u'likely', u'may', u'most', u'off', u'other', u'rather', u'say', u'says', u'than', u'that', u'the', u'then', u'these', u'they', u'tis', u'too', u'wants', u'were', u'while', u'whom', u'with', u'would', u'yet', u'you', u'your']` – Bzz21 Dec 19 '18 at 12:59
`sorted` receives list as argument and returns new *sorted* list. If you mean that your problem is `u` in front of each string, then it's [not really a problem](https://stackoverflow.com/questions/599625/python-string-prints-as-ustring). It's just how unicode string looks like in Python 2.x – Andersson Dec 19 '18 at 13:22

score 0 · Answer 2 · answered Dec 19 '18 at 11:14

I was able to recreate your issue using the following code:

words = ["hello", "world", "abc", "def"]

wordlist = []
wordlist.append(words)

print(words)
print(wordlist.sort())

This outputs:

['hello', 'world', 'abc', 'def']
None

Which I believe is the issue you are having.

To fix it I did two things: 1) wordlist.append(words) for wordlist = words.copy() - this copies the array rather than appending the array to an array element and 2) move the wordlist.sort() out of the print function - sort returns nothing and is an in place sort so returns nothing.

So, the complete updated example is:

words = ["hello", "world", "abc", "def"]

wordlist = []
wordlist = words.copy()
wordlist.sort()

print(words)
print(wordlist)

Which now outputs the sorted list (as you required):

['hello', 'world', 'abc', 'def']
['abc', 'def', 'hello', 'world']

the output i got when i print words from website is not in the list form. looks like this is the problem. the output is just like this `also, could, since, let, least, their, among, have, was, may, across, just, been, our, whom, some` — Bzz21, Dec 19 '18 at 12:48

scrape data and sort it using Python 2.7 and selenium

2 Answers2