2

I have a dataframe which is a subset of another dataframe and contains the following indexes: 45, 46, 47, 51, 52

Example dataframe:

      price  count
45   3909.0      8
46  3908.75      8
47  3908.50      8
51  3907.75      8
52   3907.5      8

I want to make 2 lists, each being its own list of the indexes that are sequential. (Example of this data format)

list[0] = [45, 46, 47]
list[1] = [51, 52]

Problem: The following code causes this error on the second to last line:

IndexError: list assignment index out of range

        same_width_nodes = df.loc[df['count'] == width]
        i = same_width_nodes.index[0]
        seq = 0
        sequences = [[]]
        sequences[seq] = []

        for index, row in same_width_nodes.iterrows():
            if i == index:
                i += 1
                sequences[seq].append(index)
            else:
                seq += 1
                sequences[seq] = [index]
                i = index

Maybe there's a better way to achieve this, but I'd like to know why I can't create a new item in the sequences list as I am doing here, and how I should be doing it.

Coder1
  • 13,139
  • 15
  • 59
  • 89
  • can you show a [mcve] of both dataframes ? output is clear – Umar.H Feb 16 '21 at 22:49
  • 2
    Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Feb 16 '21 at 22:51
  • Presumably we assume your index is in sorted order, and doesn't contain duplicates. – smci Feb 16 '21 at 23:35
  • 1
    Also in the general case, presumably you want to make N sublists, one for each contiguous range of integers. So your output will be a list-of-lists. – smci Feb 16 '21 at 23:42
  • 1
    the rule is read a sequence of numbers, while the current and previous numbers are sequential add them to a list of lists, once the sequence is broken then create a new list of indexes. – Golden Lion Feb 18 '21 at 16:55

4 Answers4

2

In steps.

First we do a rolling diff on your index, anything that is greater than 1 we code as True, we then apply a cumsum to create a new group per sequence.

45    0
46    0
47    0
51    1
52    1

Next, we use the groupby method with the new sequences to create your nested list inside a list comprehension

Setup.

df = pd.DataFrame([1,2,3,4,5],columns=['A'],index=[45,46, 47, 51, 52])


 A
45  1
46  2
47  3
51  4
52  5

df['grp'] = df.assign(idx=df.index)['idx'].diff().fillna(1).ne(1).cumsum()

idx = [i.index.tolist() for _,i in df.groupby('grp')]

[[45, 46, 47], [51, 52]]
Umar.H
  • 22,559
  • 7
  • 39
  • 74
2

You can use this:

s_index=df.index.to_series()
l = s_index.groupby(s_index.diff().ne(1).cumsum()).agg(list).to_numpy()

Output:

l[0]
[45, 46, 47]

and

l[1]
[51, 52]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

The issue is with this line

sequences[seq] = [index]

You are trying to assign the list an index which is not created. Instead do this.

sequences.append([index])
Agyey Arya
  • 240
  • 1
  • 8
0

I use the diff to find when the index value diff changes greater than 1. I iterate the tuples and access by index their values.

index=[45,46,47,51,52]
price=[3909.0,3908.75,3908.50,3907.75,3907.5]
count=[8,8,8,8,8]

df=pd.DataFrame({'index':index,'price':price,'count':count})
df['diff']=df['index'].diff().fillna(0)
print(df)
result_list=[[]]
seq=0
for row in df.itertuples():
     index=row[1]
     diff=row[4]
     if diff<=1:
         result_list[seq].append(index)
     else:
         seq+=1
         result_list.insert(1,[index])

print(result_list)  

output:
[[45, 46, 47], [51, 52]]
Golden Lion
  • 3,840
  • 2
  • 26
  • 35