1

I can't seem to see what is wrong here. I have a Pandas Series of length (16110) and a DataFrame of shape (13, 16116). I am simply trying to trim the DataFrame to the length of my Series.

I thought this would be a simple matter of:

df = df[:len(series)]

This code runs with no error, but doesn't seem to do much in the way of shortening.

Am I missing something here?

2 Answers2

3

Apparently you want DataFrame.head.

It's a convenience method; basically head(self, n) returns self.iloc[:n].

Please note that this is not cutting the original frame. It returns a view (subset) of the original frame, as much as I can tell. Some kinds of slicing return copies; it depends on context and is not very easy to predict.

9000
  • 39,899
  • 9
  • 66
  • 104
  • Makes sense, so I'm extending it. – 9000 Apr 06 '18 at 15:07
  • This also won't get the asker what they're looking for... – RCA Apr 06 '18 at 15:08
  • @RCA: I think anything `iloc`-based won't actually cut the frame. But slicing of a regular Python list won't cut it either, so the expected behavior of `foo = bar[:bound]` is "giving a shorter representation" and not "freeing up unused parts" anyway. – 9000 Apr 06 '18 at 15:17
  • Your longer answer makes sense. – RCA Apr 06 '18 at 15:21
  • 1
    This answer will not work with the input provided. The df is of shape (13, 16116) - We are slicing for columns and not rows. The solution to this with the input provided would be `df.iloc[:, :len(series)] ` – cdwoelk Apr 06 '18 at 15:22
1

Here is another way, using df.truncate:

df = df.truncate(after=len(series)-1)

Example:

>>> df
          0         1         2         3         4
0 -0.615868  0.367161  0.138472 -0.353085  0.953871
1  0.063501 -0.256693  0.895870  0.368182  0.156447
2 -0.148034 -0.572105 -3.030083  1.092318 -2.635359
3 -1.038899  1.198679  2.633639 -0.149085 -1.574603
4 -2.639766  1.377038 -1.263696 -1.999058 -1.540654
5  1.683478 -0.403260 -1.551362 -0.007200  0.240715
6  1.033099  0.659052 -0.306415  0.086918 -1.523796
7 -1.514313  0.117010  0.490440  0.497393  0.123755
8  0.078399  0.218355 -0.255076 -0.474265 -0.430907
9  0.868665  1.917818  1.303568  1.772729 -0.446849

>>> series
0    0.311083
1    0.498019
2   -0.671698
dtype: float64

>>> df.truncate(after=len(series)-1)
          0         1         2         3         4
0 -0.615868  0.367161  0.138472 -0.353085  0.953871
1  0.063501 -0.256693  0.895870  0.368182  0.156447
2 -0.148034 -0.572105 -3.030083  1.092318 -2.635359

Although, just to note, I can't reproduce your problem. For instance:

df[:len(series)]

Returns a truncated df as well:

>>> df[:len(series)]
          0         1         2         3         4
0 -0.615868  0.367161  0.138472 -0.353085  0.953871
1  0.063501 -0.256693  0.895870  0.368182  0.156447
2 -0.148034 -0.572105 -3.030083  1.092318 -2.635359
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 1
    Maybe check the df ,shape , I think the slice the column is the key , not the row – BENY Apr 06 '18 at 15:12
  • 1
    I'm not sure I understand. Also, starting with a `df` of shape `(10,5)` and a series of `len` 3, I end up with a truncated `df` of shape `(3,5)` using `df[:len(series)]` (see edited post), so it seems to be tuncating the rows and not the columns... Maybe I'm missing something though... – sacuL Apr 06 '18 at 15:19