0

I'm not sure if this question has been asked before, but I have a dataframe with > 2M rows and there is a column that identifies which location each transaction occurred at. I am trying to filter down and create a new dataframe for each Location code. I can filter that dataframe, but the problem I'm running into is having a function that changes the name of each new dataframe so that I end up with each one having a distinct name. I have some code to show what I have so far:

df  = pd.DataFrame({'location':[1, 2, 3, 4, 5], 'col2': [234.34, 34.80, 23.65, 24.23, 12.00]})
filter_array = []

def new_df_for_columns(df, column, filter_array):
    i = 0
    for column in filter_array:
        newdf = df[df[column] == filter_array[i]]
        i += 1
    return newdf.head()

So in this case, I need to change "newdf" for each new created dataframe.

lpack
  • 15
  • 1
  • 4
  • 1
    Umm... would something like `grouped = df.groupby('location')` and then using `grouped.get_group(3)` (or whatever value) when you need a certain group do what you need here? – Jon Clements Oct 16 '19 at 16:29
  • 1
    https://stackoverflow.com/questions/23691133/split-pandas-dataframe-based-on-groupby Is probably what you want. A `dict` is a natural container, so that you can later reference each sub_group via `location`. But why do you need distinct DataFrames? That may be irrelevant depending upon what you plan to do next. – ALollz Oct 16 '19 at 16:31
  • Yes that is perfect @ALollz. I am outputting a pivot table to excel and needed a separate one for each location, so it seemed simplest to write a function that did this instead of hard-coding out each new dataframe. – lpack Oct 16 '19 at 18:56

1 Answers1

0

If the transaction codes are ordered numbers, then you may use the index of the dataframe by just typing:

df.reindex (a list of indexes that correspond to the transaction codes)

For example, if your data is:

df = pd.DataFrame({'location':[1, 2, 3, 4, 5], 'col2': [234.34, 34.80, 23.65, 24.23, 12.00],index = range(5)})

And you want to filter locations 3 and 4, then type df.reindex([2,3]) This does not transform your data. It just creates a view. Your data will be the same.

Sajjan Singh
  • 2,523
  • 2
  • 27
  • 34
Parinha
  • 16