since pandas doesn't allow you to have different length of columns, just don't use it to import your data. Your goal is to create three seperate df
, so first import the data as lists, and then deal with it and its differents lengths.
One way to solve this is read the data with csv.reader
and create the df's
with list comprehension together with a condition for the length of the lists.
with open('input.csv', 'r') as f:
reader = csv.reader(f, delimiter=' ')
data= list(reader)
df1 = pd.DataFrame([item for item in data if len(item)==3], columns='ID NAME AGE'.split())
df2 = pd.DataFrame([item for item in data if len(item)==4], columns='ID NAME COUNTRY AGE'.split())
df3 = pd.DataFrame([item for item in data if len(item)==5], columns='ID NAME STATE COUNTRY HOBBY'.split())
print(df1, df2, df3, sep='\n\n')
ID NAME AGE
0 1 NATA 18
ID NAME COUNTRY AGE
0 1 OLEG FR 18
ID NAME STATE COUNTRY HOBBY
0 1 OLEG US FRANCE BIG
If you need to hardcode too many lines for the same step (e.g. too many df's
), then you should consider using a loop to create them and store each dataframe as key/value in a dictionary.
EDIT
Here is the little optimizedway of creating those df's
. I think you can't get around creating a list of columns you want to use for the seperate df's
, so you need to know what variations of number of columns you have in your data (except you want to create those df's
without naming the columns.
col_list=[['ID', 'NAME', 'AGE'],['ID', 'NAME', 'COUNTRY', 'AGE'],['ID', 'NAME', 'STATE', 'COUNTRY', 'HOBBY']]
with open('input.csv', 'r') as f:
reader = csv.reader(f, delimiter=' ')
data= list(reader)
dict_of_dfs = {}
for cols in col_list:
dict_of_dfs[f'df_{len(cols)}'] = pd.DataFrame([item for item in data if len(item)==len(cols)], columns=cols)
for key,val in dict_of_dfs.items():
print(f'{key=}: \n {val} \n')
key='df_3':
ID NAME AGE
0 1 NATA 18
key='df_4':
ID NAME COUNTRY AGE
0 1 OLEG FR 18
key='df_5':
ID NAME STATE COUNTRY HOBBY
0 1 OLEG US FRANCE BIG
Now you don't have variables for your df
, instead you have them in a dictionary as keys. (I named the df
with the number of columns it has, df_3
is the df
with three columns.
If you need to import the data with pandas, you could have a look at this post.