I am loading data of size comparable to my memory limits, so I am conscious about efficient indexing and not making copies. I would need to work on columns 3:8 and 9: (also labeled), but combining ranges does not seem to work. Rearranging the columns in the underlying data is needlessly costly (an IO operation). Referencing two dataframes and combining them also sounds like something that would make copies. What is an efficient way to do this?
import numpy as np
import pandas as pd
data = pd.read_stata('S:/data/controls/lasso.dta')
X = pd.concat([data.iloc[:,3:8],data.iloc[:,9:888]])
By the way, if I could read in only half of my data (a random half, even), that would help, again I would not open the original data and save another, smaller copy just for this.