2

I have two dataframes in pandas. DF "A" contains the start and end indexes of zone names. DF "B" contains the start and end indexes of subzones. The goal is to extract all subzones of all zones.

Example:

A:

 start index | end index | zone name 
-----------------------------------
   1         |  10       |    X

B:
 start index | end index | subzone name 
-----------------------------------
   2         |  3        |    Y

In the above example, Y is a subzone of X since its indexes fall within X's indexes.

The way I'm currently doing this is using iterrows to go through every row in A, and for every row (zone) I find the slice in B (subzone). This solution is extremely slow in pandas since iterrows is not fast. How can I do this task without using iterrows in pandas?

Moji
  • 121
  • 4

1 Answers1

1

Grouping with Dicts and Series is possible, Grouping information may exist in a form other than an array. Let’s consider another example DataFrame ( since your Data Frames don't have Data do i m taking my own DF DFA =mapping, DFB= people with values and that have real world interpretations):

people = pd.DataFrame(np.random.randn(5, 5),
         columns=['a', 'b', 'c', 'd', 'e'],
         index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.iloc[2:3, [1, 2]] = np.nan # Add a few NA values

Now, suppose I have a group correspondence for the columns and want to sum together the columns by group:

mapping = {'a': 'red', 'b': 'red', 'c': 'blue',
           'd': 'blue', 'e': 'red', 'f' : 'orange'}
#Mapping is a Dictionary just like a DataFrame (DF A representing Zones)

you could construct an array from this dict to pass to groupby, but instead we can just pass the dict ( I sure you can convert at Dictionary to dtata Frame and Data Frame to Dictionary, so skipping the step, other wise you are well come to ask in comments)

by_column = people.groupby(mapping, axis=1)

i am using sum() operator you can use whatever operator you want ( in case you want to combine sub Zones with Parent Zones you can do this by concatenation- out of scope of this other wise i would have gone in details )

by_column.sum()

The same functionality holds for Series, which can be viewed as a fixed-size mapping:

Note: using functions with arrays, dicts, or Series is not a problem as everything gets converted to arrays internally.