I'm working with fairly large datasets that are close to my available memory. I want to select a subset of columns based on column names and then save this data. I don't think I can use regular slicing, as in :2
notation, so I need to select based on label or location. But it seems the only way to do this produces a copy, increasing memory usage considerably whenever I want to save a subset of the data. Is it possible to select a view without using slices? Or is there some creative way to use slices that can allow me to select arbitrarily located columns?
Consider the following:
import pandas as pd
df = pd.DataFrame([[1, 2, 1], [3, 4, 1]], columns=list('abc'))
# you can get a view using :2 slicing
my_slice = df.iloc[:, :2]
my_slice.iloc[0, 0] = 100
df
a b c
0 100 2 1
1 3 4 1
my_slice
a b
0 100 2
1 3 4
This returns a view and hence doesn't copy, but I had index by slicing.
Now I try alternatives.
my_slice = df.iloc[:, [0, 1]]
my_slice.iloc[0, 0] = 99
my_slice
a b
0 99 2
1 3 4
df
a b c
0 100 2 1
1 3 4 1
Or
my_slice = df.loc[:, ['a', 'b']]
my_slice.iloc[0, 0] = 55
my_slice
a b
0 55 2
1 3 4
df
a b c
0 100 2 1
1 3 4 1
Thus, the last two attempts returned a copy. Again, this is just a simple example. In reality, I have many more columns and the location of the subset of columns I want to save may not be amenable to slicing. This post is related, as it discusses selecting columns from dataframes, but it doesn't focus on being able to select views.