1

I want to reduce my pandas data frame (df), to first 2 values in Python 2.7. Currently, my data frame is like this:

>>> df
            test_number result  Count
21946       140063       NTV    23899
21947       140063       <9.0    1556
21948       140063       <6.0     962
21949       140063       <4.5     871
21950       140063       <7.5     764
21951       140063       <5.4     536

I want it to be like this:

            test_number result  Count
21946       140063       NTV    23899
21947       140063       <9.0    1556

I don't want to limit the output but to reduce data frame size.

Shubham Namdeo
  • 1,845
  • 2
  • 24
  • 40

3 Answers3

4

Use the integer location .iloc operator

df.iloc[:2]
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
1

This should do it

df = df.iloc[:2, :]
Alex O
  • 118
  • 7
  • I got this warning, SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. But It gives the required result. Thanks. – Shubham Namdeo Dec 06 '16 at 01:26
  • Interesting. It thinks that a chained assignment is happening. That would have everything to do with what you were assigning the result of the iloc to. SettingWithCopy is a very nasty error when it happens, I wouldn't take chances with it. The warning is also not 100% predictable. (It doesn't always warn you when you're doing something scary). I'd definitely give this read if you have not already: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy – Alex O Dec 06 '16 at 02:16
  • Okay, I will read the link provided by you, and thanks for the detailed explanation for the Warning. I will keep it in mind. – Shubham Namdeo Dec 06 '16 at 03:00
  • FYI, great explanation on chained assignment by Pandas maintainer: http://stackoverflow.com/questions/21463589/pandas-chained-assignments – Alex O Dec 06 '16 at 03:08
1

I will present two answers to the refered question:

  1. Divide your dataframe in two sections,
df = pd.read_csv('mydata.csv')    
df_1 = len(df) // 2        # Divide the df by 2
half_1 = df.iloc[:df_1,]   # Assign the first half to object
len(first_half)            # Check the length to check the if it is really the half
  1. Create a random sample of the data based on percentage, using the pd.sample() function
df = pd.read_csv('mydata.csv')
sampled_df = df.sample(frac=0.3) # Get 30% of the data
len(sampled_df)                  # check length
  1. Create a sliced sample with specific number of lines, using the pd.sample() function
df = pd.read_csv('mydata.csv')
Specific_Rows = df.sample(n=40) # Select 40 random rows of your dataset
print(Specific_Rows)
  • 1
    Thanks for the answer after this long, problem was solved long ago, but good to have some new solution to it for future readers, please upvote the question as well if it looks good to you. – Shubham Namdeo Apr 18 '22 at 11:42