Select first 20% of list, then next 20% of list

Question

I have a list like this with about 141 entries:

training = [40.0,49.0,77.0,...... 3122.0]

and my goal is to select the first 20% of the list. I did it like this:

testfile_first20 = training[0:int(len(set(training))*0.2)]
testfile_second20 = training[int(len(set(training))*0.2):int(len(set(training))*0.4)]
testfile_third20 = training[int(len(set(training))*0.4):int(len(set(training))*0.6)]
testfile_fourth20 = training[int(len(set(training))*0.6):int(len(set(training))*0.8)]
testfile_fifth20 = training[int(len(set(training))*0.8):]

Is there any way to do this automatically in a loop? This is my way of selecting the Kfold.

Thank you.

training[0:(len(training)/5)]. Been a while since I’ve used python but that should work. It will take the length of training, divide it by five (i.e. 20% of training) and return that array of values. — TheEpicPanic, Nov 22 '18 at 15:40
Possible duplicate of [How do you split a list into evenly sized chunks?](https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks) — Abhishek Dujari, Nov 22 '18 at 16:04

berkelem · Accepted Answer · 2018-11-27T10:55:37.187

1

You can use list comprehensions:

div_length = int(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:(i+1)*div_length] for i in range(5)]

This will give you your results stacked in a list:

>>> [testfile_first20, testfile_second20, testfile_third20, testfile_fourth20, testfile_fifth20]

If len(training) does not divide equally into five parts, then you can either have five full divisions with a sixth taking the remainder as follows:

import math

div_length = math.floor(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(6)]

or you can have four full divisions with the fifth taking the remainder as follows:

import math

div_length = math.ceil(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(5)]

edited Nov 27 '18 at 10:55

answered Nov 22 '18 at 15:38

berkelem

2,005
3
18
36

if i try this i get an error code like this: slice indices must be integers or None or have an __index__ method – raffa_sa Nov 22 '18 at 15:41
if i run this `for i in range(5): print(len(testfile_divisions[i]))` i get `28 55 82 109 137` but the result should have the same length, i mean every part of the list should have the same entry length – raffa_sa Nov 22 '18 at 15:45
Ah okay. I've corrected the code. I think this should work. – berkelem Nov 22 '18 at 15:47
just found an error, if `len(training)` is not able to be divided by 5 i loose somethin, which should not happen @berkelem – raffa_sa Nov 27 '18 at 08:49
I updated the answer. There are two ways you can handle this, either having five full divisions with a remainder or four full divisions with the fifth division being a remainder. – berkelem Nov 27 '18 at 10:56

r.ook · Answer 2 · 2018-11-22T16:01:54.270

1

Here's a simple take with list comprehension

lst = list('abcdefghijkl')
l = len(lst)

[lst[i:i+l//5] for i in range(0, l, l//5)]

# [['a', 'b'], 
#  ['c', 'd'], 
#  ['e', 'f'], 
#  ['g', 'h'], 
#  ['i', 'j'], 
#  ['k', 'l']]

Edit: Actually now that I look at my answer, it's not a true 20% representation as it returns 6 sublists instead of 5. What is expected to happen when the list cannot be equally divided into 5 parts? I'll leave this up for now until further clarifications are given.

edited Nov 22 '18 at 16:01

answered Nov 22 '18 at 15:51

r.ook

13,466
2
22
39

edit: what i was thinking is that 0 to 5 would actually count as 6 sublists including the 0 – Abhishek Dujari Nov 22 '18 at 15:56
@AbhishekDujari then the last element `['k', 'l']` will be missing. The `list` end up incompletely sliced. – r.ook Nov 22 '18 at 16:00
1

could be dupe of this https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks – Abhishek Dujari Nov 22 '18 at 16:01
@AbhishekDujari good suggestion, you should tag this thread with the linked question. – r.ook Nov 22 '18 at 16:03

score 0 · Answer 3 · answered Nov 22 '18 at 15:44

You can loop this by just storing the "size" of 20% and the current starting point in two variables. Then add one to the other:

start = 0
twenty_pct = len(training) // 5

parts = []
for k in range(5):
    parts.append(training[start:start+twenty_pct])
    start += twenty_pct

However, I suspect there are numpy/pandas/scipy operations that might be a better match for what you want. For example, sklearn includes a function called KFold: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

score 0 · Answer 4 · answered Nov 22 '18 at 15:47

0

Something like this, but maybe you may lose an element due to rounding.

tlen = float(len(training))    
testfiles = [ training[ int(i*0.2*tlen): int((i+1)*0.2*tlen) ] for i in range(5) ]

answered Nov 22 '18 at 15:47

jlanik

859
5
12

Select first 20% of list, then next 20% of list

4 Answers4