1

I have a list of words (equivalent to about two full sentences) and I want to split it into two parts: one part containing 90% of the words and another part containing 10% of them. After that, I want to print a list of the unique words within the 10% list, lexicographically sorted. This is what I have so far:

    pos_90 = (90*len(words)) // 100 #list with 90% of the words
    pos_90 = pos_90 + 1 #I incremented the number by 1 in order to use it as an index
    pos_10 = (10*len(words)) // 100 #list with 10% of the words
    list_90 = words[:pos_90] #Creation of the 90% list
    list_10 = words[pos_10:] #Creation of the 10% list
    uniq_10 = set(list_10) #List of unique words out of the 10% list
    split_10 = uniq_10.split()
    sorted_10 = split_10.sort()
    print(sorted_10)

I get an error saying that split cannot be applied to set, so I assume my mistake must be in the last lines of code. Any idea about what I'm missing here?

Me All
  • 269
  • 1
  • 5
  • 17
  • 1
    What do you expect `uniq_10.split()` to do? – bereal Oct 30 '18 at 18:46
  • 1
    Possible duplicate of [Sorting a set of values](https://stackoverflow.com/questions/17457793/sorting-a-set-of-values) – Austin Oct 30 '18 at 18:48
  • I was thinking of separating all the words to have them sorted later, though I understand it might be redundant. In any case, the error I get doesn't have to do with that, I think – Me All Oct 30 '18 at 18:49
  • 2
    `uniq_10` is already a set, `split` is a function you apply on string in order to make them list. – omri_saadon Oct 30 '18 at 18:49
  • Note: As noted in [this answer](https://stackoverflow.com/a/53071119/364696), ignoring your actual exception, your code has a logic error. `pos_10` is an index ~10% of the way into `words`, so `words[pos_10:]` says "give me everything from 10% in through the end", which is ~90% of all the words (the last 90%). So `list_90` ends up being the first ~90% of words, and `list_10` ends up as the last ~90% of words. At no point do you take 10% of the words. – ShadowRanger Oct 30 '18 at 19:03
  • @ShadowRanger I used `list_10 = words[pos_90:]` instead, but I'm still getting more unique words than what I was expecting. Is this last statement wrong or is the way I selected unique words wrong? – Me All Oct 30 '18 at 19:30
  • @MeAll: Without the input, expected output, and actual output part of a [MCVE], I can't answer that. Providing a real [MCVE] serves multiple purposes; minimizing the example often means you identify the problem (and don't have to ask at all), and providing a complete, reproducible error with inputs/outputs is the only way we can help. We can't psychically debug your code. – ShadowRanger Oct 30 '18 at 19:35

1 Answers1

0

split only makes sense when converting from one long str to a list of the components of said str. If the input was in the form 'word1 word2 word3', yes, split would convert that str to ['word1', 'word2', 'word3'], but your input is a set, and there is no sane way to "split" a set like you seem to want; it's already a bag of separated items.

All you really need to do is convert your set back to a sorted list. Replace:

split_10 = uniq_10.split()
sorted_10 = split_10.sort()

with either:

sorted_10 = list(uniq_10)
sorted_10.sort()  # NEVER assign the result of .sort(); it's always going to be None

or the simpler one-liner that encompasses both listifying and sorting:

sorted_10 = sorted(uniq_10)  # sorted, unlike list.sort, returns a new list

The final option is generally the most Pythonic approach to converting an arbitrary iterable to list and sorting that new list, returning the result. It doesn't mutate the input, doesn't rely on the input being a specific type (set, tuple, list, it doesn't matter), and it's simpler to boot. You only use list.sort() when you already have a known list, and don't mind mutating it.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • @Austin: It's not a dupe though, given that the OP is asking *why* their code doesn't work, not merely "How do I do this?" It's a relevant link, but not a dupe. – ShadowRanger Oct 30 '18 at 18:53
  • @ShadowRanger this works, though I'm getting more words than just the unique words. Should I modify anything else? – Me All Oct 30 '18 at 19:22
  • @MeAll: Did you see [my note about the logic error](https://stackoverflow.com/questions/53070941/using-split-after-a-set-statement-in-python/53071030?noredirect=1#comment93040784_53070941)? `list_10` should be initialized with either `words[:pos_10]` or `words[pos_90:]` (depending on whether it should overlap the contents of `list_90` or not). – ShadowRanger Oct 30 '18 at 19:26