I have a rather large dataset (~15GB zipped). What is the most efficient way of random sampling from this dataset using Pandas? Currently I have the following way;
df = pd.read_csv (file, names = []
, sep = '|', nrows=10000000)
However this really does not serve my need. Additionally is there a way I can filter the data before creating the dataframe?
Any help is appreciated :)