0

I'm still not accustomed to using functions, so I decided to try to create one (for practice) that prints out the first five groups of a groupby object. The problem with the function I have written is that, it seems to be printing out all of the groups of the groupby object, instead of just the first five. I can't figure out the error.

x = Ticket_Names.groupby('Ticket')

def Groupby_func(y):
    a=0
    while a <=5:   #trying to use 'a' as a limiter, 
                   #to limit printing just the first five groups
        for i, j in y:
            print i,j
            a+=1


Groupby_func(x) # calling the function

So instead of printing just the first five groups, it's printing all of them (around 238).

My dataframe looks something this:

             Ticket       Name
PassengerId                                                           
258          110152     Cherry, Miss. Gladys
505          110152     Maioni, Miss. Roberta
760          110152     Lucy, Noel Martha Dye
586          110413     Taussig, Miss. Ruth
263          110413     Taussig, Mr. Emil
737          6608       Ford, Mrs. Edward 
93           5734       Chaffee, Mr. Herbert
906          5734       Chaffee, Mrs. Herbert 
746          5735       Crosby, Capt. Edward Gifford
541          5735       Crosby, Miss. Harriet 

The groupby groups them by ticket, so in this sample set only 5-6 groups will be created, but in the full dataframe around 230-300 groups are created.

When I run the function above, instead of getting it print the first five groups, it's printing what seems to be all the groups of the groupby object.

Moondra
  • 4,399
  • 9
  • 46
  • 104
  • Doing custom `for` loops on `pandas` dataframes is usually not the best approach. Can you provide a minimal working example of your dataframe and what you need to achieve? Usually it can be done using `pandas` functions. – Dennis Golomazov Nov 02 '16 at 23:55
  • Okay, I've edited the question and created a sample dataframe. – Moondra Nov 03 '16 at 00:26

1 Answers1

1
[g[1] for g in list(Ticket_Names.groupby('Ticket'))[:5]]

source

Community
  • 1
  • 1
Dennis Golomazov
  • 16,269
  • 5
  • 73
  • 81
  • Thank you. Was there anything wrong with my code? Or is that Pandas is very hit or miss with custom `for` loops. – Moondra Nov 04 '16 at 18:55
  • 1
    @moondra Your code prints all groups, because `for i, j in y` go through all of them. You may fix that by removing `while` loop and adding a check inside the `for` loop: `if a >= 5: break`. Unrelated to this particular question, this is a [good article](https://www.datascience.com/blog/straightening-loops-how-to-vectorize-data-aggregation-with-pandas-and-numpy/) on vectorization (note: vectorization is NOT what is done in my answer, and is unrelated to the topic; I've just compacted the `for` loop). – Dennis Golomazov Nov 04 '16 at 19:11
  • Thank you so much. I learned a lot. I will take a look at the article you posted. – Moondra Nov 04 '16 at 19:24
  • @moondra glad it helps! – Dennis Golomazov Nov 04 '16 at 19:26