5

I have a list of DataFrames that have the same columns and different values. I want to drop some columns from the list of DataFrames in one line in pandas.

For far, I tried (dfs has list of Data Frames)

dfs.drop([col for col in ['var1', 'var2'] if col in dfs], axis=1, inplace=True)

and

dfs[dfs.drop([col for col in ['var1', 'var2'] if col in dfs], axis=1, inplace=True)]

both are giving same error:

AttributeError: 'list' object has no attribute 'drop'

type(dfs)
>> list

However, when i can loop through each DataFRame from the list dfs using for loop, I can drop the columns.

How can I do it in the list comprehension way in pandas?

cs95
  • 379,657
  • 97
  • 704
  • 746
i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • But `dfs` is list, not a single dataframe. – Willem Van Onsem Jul 19 '17 at 19:34
  • 2
    A list comprehension is not the idiomatic way to solve this problem. – cs95 Jul 19 '17 at 19:34
  • `dfs` is a list of DataFrames – i.n.n.m Jul 19 '17 at 19:34
  • @COLDSPEED okay, I was wondering if I could do in list comprehension. Thanks for the suggestion, I will continue to use the regular `for` loop – i.n.n.m Jul 19 '17 at 19:35
  • @i.n.n.m Not that you can't do it... you _can_. But why? – cs95 Jul 19 '17 at 19:36
  • @COLDSPEED because I have about 20 different dataframes in `dfs` I wanted to look at some dataframes with a variable that I am interested in (`'var3','var4'`) while dropping (`'var1', 'var1'`) others. Ans then I would drop `var3` and look at the rest together. This is why I wanted to drop some columns from the dataframes in `dfs` – i.n.n.m Jul 19 '17 at 19:40

1 Answers1

8

Assuming you want to drop ['var1', 'var2'] columns, and your data frames have the same columns, you should use a for loop.

for df in dfs:
    df.drop(['var1', 'var2'], axis=1, inplace=True)

Alternatively, you could also use this:

dfs = [df.drop(['var1', 'var2'], axis=1) for df in dfs]

Omitting the inplace=True will cause df.drop to return a new dataframe, rather than updating inplace and returning None.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Will this work? Isn't the loop iterating over a copy of the list (of dataframes) instead of the actual list itself? – Adarsh Chavakula Jul 19 '17 at 19:40
  • 1
    @AdarshChavakula When working with mutable objects like dataframes, you're working directly with the reference, and not a copy. It does indeed work. – cs95 Jul 19 '17 at 19:40
  • 1
    As an exercise, create `df`, then assign `df` to `df1`. Try dropping a column in `df1`, and the change is reflected in `df` too. – cs95 Jul 19 '17 at 19:41
  • @COLDSPEED Thanks for that :) Need to learn what mutable and non-mutable objects are. – Adarsh Chavakula Jul 19 '17 at 19:42
  • 1
    @COLDSPEED thank you, `dfs = [df.drop(['var1', 'var2'], axis=1) for df in dfs]` list comprehension way works just fine :) – i.n.n.m Jul 19 '17 at 19:46
  • 1
    @i.n.n.m I still stand by the loop, but hey, whatever gets the job done, right? ;) More power to you. – cs95 Jul 19 '17 at 19:48
  • @COLDSPEED agreed, I wanted to try the list comprehension method too for future purposes! – i.n.n.m Jul 19 '17 at 19:49
  • I agree that the list comprehension with the inplace argument is non-idiomatic but for a list of DataFrames, your last suggestion is good. The argument itself is a little misleading but it actually creates a copy under the hood so no memory savings unfortunately. – ayhan Jul 19 '17 at 19:56
  • @ayhan Thank you, that was illuminating! I've edited my post. – cs95 Jul 19 '17 at 19:58
  • 1
    `inplace=True` is misleadingly named -- internally, the method creates a new (sub)-DataFrame then calls [`_update_inplace`](https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L2279) which copies the resultant DataFrame's `_data` back into the calling DataFrame's `_data` attribute. The `inplace=True` option persists for backwards-compatibility, but because of its misleading nature, [probably should not recommended for use going forward](http://stackoverflow.com/a/22533110/190597). – unutbu Jul 19 '17 at 20:20
  • @cs95 there is any technical reason for suggesting the loop instead of the list of comprehension (time/memory ecc...)? – Andrea Ciufo Mar 06 '20 at 09:42
  • @AndreaCiufo I need to remove the list comprehension from the answer, list comps are not meant to be used to trigger side effects (as is being done here). – cs95 Mar 06 '20 at 15:54