1

I have a utility that loads a sqlite3 table from a CSV file. The CSV file has 2000+ columns. SQLite3 tables have a limit of 2000 columns. The process that I have used so far has been to load the CSV into a dataframe, drop unrequired columns and load into a database table. But any additions to the CSV file is going to blow out the max columns. This setting is hardcoded and I would need to recompile SQLite3 to fix that (which is beyond my pay grade).

Any thoughts on an elegant way to handle this?

The existing load process is as follows:

df_survey = pd.read_csv(data_full_path)
print('File {} loaded into dataframe'.format(data_file))

# drop uneccessary columns

number_columns = len(df_survey.columns)
print('Number of columns in dataframe={}'.format(number_columns))

column_list = list(df_survey.columns.values)
for column in column_list:
    if column.startswith('Tag:'):
    df_survey.pop(column)

number_columns = len(df_survey.columns)
print('Number of columns after purge in dataframe={}'.format(number_columns))

table_name = 'wave'
df_survey.to_sql(table_name, gvars.db_conn, schema=None, if_exists='replace', index=False)

I haven't tried anything yet - asking for advice before I spin the wheels.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • What if you split the columns among several tables? Add a key column that is shared across tables, and join on it to retrieve all your columns in a query – joH1 Mar 09 '23 at 08:11
  • Another approach could be to look at your data and store the many many fields as properties for given key values as JSON in a single field. SQLite supports this directly. And then there's no limit. Of course then you can't easily do select/sort/update on single values. But only you can tell from your data what is practical. – MyICQ Mar 09 '23 at 11:49

0 Answers0