I'm not sure why it has to be one csv file. There are many Python-based libraries for working with a dataset spread across multiple csvs.
In all of the examples, you pass a glob
pattern, that will match multiple files. This pattern works very naturally with Azure ML Dataset which you can use as your input. See this excerpt from the docs link above.
from azureml.core import Workspace, Datastore, Dataset
datastore_name = 'your datastore name'
# get existing workspace
workspace = Workspace.from_config()
# retrieve an existing datastore in the workspace by name
datastore = Datastore.get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore
datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')] # here's the glob pattern
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
Assuming that all the csvs can fit into memory, you can turn these datasets easily into pandas
dataframes. with Azure ML Datasets, you call
# get the input dataset by name
dataset = Dataset.get_by_name(ws, name=dataset_name)
# load the TabularDataset to pandas DataFrame
df = dataset.to_pandas_dataframe()
With Dask Dataframe, this GitHub issue says you can call
df = my_dask_df.compute()
As far as output datasets, you can control this by reading in the output CSV as a dataframe, appending data to it then overwriting it to the same location.