0

I have a data frame DF1 that contains a start index and end index. I want to use these indices to create multiple slices of another data frame DF2.

I don't want to iterate through the whole dataset, one way I thought was to create an extra column on DF2 that will tell me where each slice starts and ends with an integer that changes on every range. This integer should also be included in DF1. But I am not sure how I would go around doing this or if there is a way to get multiple subsets of the data frame as a list or something.

Some things to consider:

  1. DF1 is always smaller than DF2
  2. I want to try do this without iterrows
  3. I want this to work on variable sizes of DF1 and DF2

Something like this:

In [1]: DF1 = pd.DataFrame([[1, 2], [5, 8], [10, 12]], columns=['StartIndex','EndIndex'])
In [2]: DF2 = pd.DataFrame([1,2,3,4,5,10,11,12,13,14,15,16,17,18,19,20], columns=['SomeValue'])

#The result should be either multiple data frames or something like this:            
#DF2
    SomeValue  Slice
0      1         NA
1      2         1
2      3         1 
3      4         NA
4      5         NA 
5     10         2
6     11         2 
7     12         2
8     13         2
9     14         NA
10    15         3
11    16         3
12    17         3
13    18         NA
14    19         NA
15    20         NA
#DF1
    StartIndex  EndIndex Slice
0        1         2       1
1        5         8       2
2       10        11       3
 

1 Answers1

1

Adapted from the great work here: Mapping ranges of values in pandas dataframe

If you can reconfigure your range inputs, this is relatively straightforward based on the linked article. Otherwise, you'll have to build the criteria list first.

criteria = [df2.index.isin(list(range(1,3))),df2.index.isin(list(range(5,9))),df2.index.isin(list(range(10,13)))]
values = [1, 2, 3]
df2['slice'] =  np.select(criteria, values, 'NA')

Output:

In [37]: df2
Out[37]:
    SomeValue slice
0           1    NA
1           2     1
2           3     1
3           4    NA
4           5    NA
5          10     2
6          11     2
7          12     2
8          13     2
9          14    NA
10         15     3
11         16     3
12         17     3
13         18    NA
14         19    NA
15         20    NA
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14
  • This makes sense, but this would not work if I have variable sizes, how would I make the criteria take x number of rows without manually writing them out. – stackoverlord Nov 20 '20 at 02:24