0

Pretty new to stackoverflow, please bear with me if the format looks odd..

I have a big set of data with 100+ columns of data structured like:

countrya countryb year variable1 variable2 ...... varaible100

I want to have the 100 variables separated into 100 new dataframes and save them into csvs.

Below is the code I have for creating 1 new csv.

dfm1=pd.melt(df, id_vars=['countrya','countryb','year'], value_vars=['variable1'],
         value_name='variable1')
dfm1.drop('variable',axis=1)
dfm1.to_csv('newdf1.csv')

How can I automate the process? Thank you!

js3933
  • 15
  • 2

2 Answers2

0

You can use a for loop against all variables, and call your function inside it(assuming your sample code is correct)

def split(df, variable_name):
  dfm1=pd.melt(df, id_vars=['countrya','countryb',variable_name], value_vars=[variable_name], value_name=variable_name)
  dfm1.drop('variable',axis=1) # I don't know what's this line used for
  dfm1.to_csv('newdf_{}.csv'.format(variable_name))

for variable_name in ['variable1', 'variable2']:
  split(df, variable_name)
Kassian Sun
  • 609
  • 4
  • 8
  • when I melt the df, it gives me a 'variable' column with all same values of my variable names. dfm1.drop('variable',axis=1) is to delete the 'variable' column.. Tried to run it and got this error: ValueError: arrays must all be same length – js3933 Aug 13 '20 at 01:34
0

Here is one way. First, create the data frame.

import pandas as pd

df = pd.DataFrame({
    'country_a': [1, 2, 3],
    'country_b': [4, 5, 6],
    'year': [2018, 2019, 2020],
    'var_a': ['a', 'b', 'c'],
    'var_b': ['x', 'y', 'z']
})

print(df)
   country_a  country_b  year var_a var_b
0          1          4  2018     a     x
1          2          5  2019     b     y
2          3          6  2020     c     z

Second, iterate over the fields with your column names.

base_fields = df.columns[:3].to_list()    # columns in every file
var_fields = df.columns[3:]               # var_a, var_b, ...

for var_field in var_fields:
    file_name = f'{var_field}.csv'
    with open(file_name, 'wt') as handle:
        fields = base_fields + [var_field]
        df.loc[:, fields].to_csv(handle)
        
        print(f'wrote {fields} to {file_name}')


wrote ['country_a', 'country_b', 'year', 'var_a'] to var_a.csv
wrote ['country_a', 'country_b', 'year', 'var_b'] to var_b.csv
                                          ^              ^
                                          last field and file name change
jsmart
  • 2,921
  • 1
  • 6
  • 13
  • this completely solves my question! Thanks so much!!!!!! – js3933 Aug 14 '20 at 01:43
  • Do you mind if I ask a followup question? I'm trying to melt all the new CSVs. The code I have for melt is: table=pd.pivot_table(var_a, values='var_a', index=['countrya','countryb'],columns=['year']) table1=table.reset_index().rename_axis(None, axis=1)... Finding a hard time automate the process again! – js3933 Aug 19 '20 at 01:13
  • Please can you post this as a new question, and include both actual input data and desired output? (It seems too involved for a comment.). This SO post may be helpful for creating the input data: https://stackoverflow.com/a/30424537/13608599 – jsmart Aug 19 '20 at 13:44
  • Hi I posted in this new question: https://stackoverflow.com/questions/63601469/separate-dataframe-into-multiple-new-dataframes-and-bulk-retructure-the-new-dfs – js3933 Aug 26 '20 at 16:08
  • Hi, I just posted my solution :) – jsmart Aug 26 '20 at 16:57