Dynamically creating nested lists of sequential numbers

Question

I have a dataframe which is a subset of another dataframe and contains the following indexes: 45, 46, 47, 51, 52

Example dataframe:

      price  count
45   3909.0      8
46  3908.75      8
47  3908.50      8
51  3907.75      8
52   3907.5      8

I want to make 2 lists, each being its own list of the indexes that are sequential. (Example of this data format)

list[0] = [45, 46, 47]
list[1] = [51, 52]

Problem: The following code causes this error on the second to last line:

IndexError: list assignment index out of range

        same_width_nodes = df.loc[df['count'] == width]
        i = same_width_nodes.index[0]
        seq = 0
        sequences = [[]]
        sequences[seq] = []

        for index, row in same_width_nodes.iterrows():
            if i == index:
                i += 1
                sequences[seq].append(index)
            else:
                seq += 1
                sequences[seq] = [index]
                i = index

Maybe there's a better way to achieve this, but I'd like to know why I can't create a new item in the sequences list as I am doing here, and how I should be doing it.

Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. — Prune, Feb 16 '21 at 22:51
Presumably we assume your index is in sorted order, and doesn't contain duplicates. — smci, Feb 16 '21 at 23:35
Also in the general case, presumably you want to make N sublists, one for each contiguous range of integers. So your output will be a list-of-lists. — smci, Feb 16 '21 at 23:42
the rule is read a sequence of numbers, while the current and previous numbers are sequential add them to a list of lists, once the sequence is broken then create a new list of indexes. — Golden Lion, Feb 18 '21 at 16:55

Umar.H · Answer 1 · 2021-02-16T23:29:38.637

In steps.

First we do a rolling diff on your index, anything that is greater than 1 we code as True, we then apply a cumsum to create a new group per sequence.

Next, we use the groupby method with the new sequences to create your nested list inside a list comprehension

Setup.

df = pd.DataFrame([1,2,3,4,5],columns=['A'],index=[45,46, 47, 51, 52])


 A
45  1
46  2
47  3
51  4
52  5

df['grp'] = df.assign(idx=df.index)['idx'].diff().fillna(1).ne(1).cumsum()

idx = [i.index.tolist() for _,i in df.groupby('grp')]

[[45, 46, 47], [51, 52]]

score 2 · Accepted Answer · answered Feb 16 '21 at 23:15

2

You can use this:

s_index=df.index.to_series()
l = s_index.groupby(s_index.diff().ne(1).cumsum()).agg(list).to_numpy()

Output:

l[0]
[45, 46, 47]

and

l[1]
[51, 52]

answered Feb 16 '21 at 23:15

Scott Boston

147,308
15
139
187

1

doh', don't know why I didn't use the same boolean condition, long day! nice answer mate – Umar.H Feb 16 '21 at 23:30

score 1 · Answer 3 · answered Feb 16 '21 at 22:55

1

The issue is with this line

sequences[seq] = [index]

You are trying to assign the list an index which is not created. Instead do this.

sequences.append([index])

answered Feb 16 '21 at 22:55

Agyey Arya

240
1
8

score 0 · Answer 4 · answered Feb 18 '21 at 17:27

I use the diff to find when the index value diff changes greater than 1. I iterate the tuples and access by index their values.

index=[45,46,47,51,52]
price=[3909.0,3908.75,3908.50,3907.75,3907.5]
count=[8,8,8,8,8]

df=pd.DataFrame({'index':index,'price':price,'count':count})
df['diff']=df['index'].diff().fillna(0)
print(df)
result_list=[[]]
seq=0
for row in df.itertuples():
     index=row[1]
     diff=row[4]
     if diff<=1:
         result_list[seq].append(index)
     else:
         seq+=1
         result_list.insert(1,[index])

print(result_list)  

output:
[[45, 46, 47], [51, 52]]

Dynamically creating nested lists of sequential numbers

4 Answers4

Setup.