1

Curious to know if there is a better way to keep needed columns in a dataframe if those I need to keep are a small number and the ones to remove are several of them

import numpy as np
df1 = pd.DataFrame(np.random.randint(10,99, size=(13, 26)), columns =list('abcdefghijklmnopqrstuvwxyz'))
df1

Output:


a   b   c   d   e   f   g   h   i   j   ... q   r   s   t   u   v   w   x   y   z
0   78  60  27  38  21  93  74  47  16  53  ... 79  56  40  41  87  80  14  82  12  50
1   84  73  59  46  91  43  22  28  57  52  ... 27  65  81  72  68  90  68  61  22  44
2   56  37  29  52  57  14  87  82  46  90  ... 67  57  29  14  55  30  46  72  56  91
3   86  44  46  79  41  74  32  49  42  32  ... 33  34  40  17  30  78  29  75  80  52
4   14  89  90  79  67  17  34  39  57  37  ... 93  49  78  91  26  73  40  48  91  36
5   16  62  32  87  56  81  82  17  59  57  ... 84  24  97  39  46  40  68  53  73  40
6   69  72  16  47  37  20  27  56  13  37  ... 10  28  17  35  39  14  51  85  69  53
7   81  34  35  20  66  44  86  23  94  57  ... 38  45  76  53  82  72  64  34  81  43
8   95  90  97  31  18  85  74  18  43  22  ... 20  20  96  25  53  76  55  96  58  98
9   73  53  72  94  55  33  22  40  11  64  ... 84  66  85  34  94  32  78  72  10  62
10  73  24  57  17  63  24  94  25  59  84  ... 34  45  27  28  47  23  38  80  45  41
11  69  18  22  42  95  38  16  47  68  36  ... 59  69  35  39  78  75  85  86  53  55
12  46  27  53  77  48  15  57  90  32  57  ... 32  79  18  67  71  86  54  11  36  51
13 rows × 26 columns

Say, I have to only keep a few random columns , E.g. e,u,r,q,j ; is there a better way to keep them having to run df1.drop() with 21 column names passed in? I could not find a better way in any of the questions.

Edit: Different from the solution in Selecting multiple columns in a pandas dataframe since the columns to choose to drop are random and not sequential

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Anil Menon
  • 47
  • 4
  • 1
    Does this answer your question? [Selecting multiple columns in a pandas dataframe](https://stackoverflow.com/questions/11285613/selecting-multiple-columns-in-a-pandas-dataframe) – Shubham Sharma Jun 10 '20 at 16:05
  • @ShubhamSharma , this would work if the ones I would have wanted to drop out were sequentially placed. The example I am working with in reality has several 100 columns and they are all over the place – Anil Menon Jun 10 '20 at 16:19

2 Answers2

1

You can copy all the rows you want to keep into a new dataframe and then overwrite your first dataframe like so:

    import numpy as np
    import pandas as pd        
    df1 = pd.DataFrame(np.random.randint(10,99, size=(13, 26)), columns =list('abcdefghijklmnopqrstuvwxyz'))
    df2 = pd.DataFrame()
    columns_to_keep = ["e", "r", "u"]
    for column in columns_to_keep:
        df2[column] = df1[column]
    df1 = df2
    df1

or alternatively using a for statement to drop any item not in a list:

    columns_to_keep = ["e", "r", "u"]
    for column_name, column_data in df1.iteritems():
      if column_name not in columns_to_keep:
        df1 = df1.drop(column_name, axis=1)
    df1
Ben George
  • 76
  • 4
  • Thanks @Ben. This is a good solution that works . I am curious to see if there is an in-place solution since we are having to create a new dataframe in this solution. .drop() seems to have an option to have this done inplace. good solution, nonetheless! – Anil Menon Jun 10 '20 at 16:24
  • the edited solution (2nd one) looks better and addresses my above comment. Thanks! – Anil Menon Jun 10 '20 at 16:25
1

Let's just use column filtering and reassign back to df1:

df1 = pd.DataFrame(np.random.randint(10,99, size=(13, 26)), columns =list('abcdefghijklmnopqrstuvwxyz'))    
columns_to_keep = ["e", "r", "u"]
df1 = df1[columns_to_keep]
df1.head()

Output:

    e   r   u
0  65  95  13
1  58  42  75
2  95  34  12
3  43  20  79
4  83  27  47
Scott Boston
  • 147,308
  • 15
  • 139
  • 187