0

I have the following set of rows in a pandas DF:

idx col1 col2 col3 col4
0 A B C D
1 E F G H
1 I J K L
2 M M O P
2 Q R S T

I want to convert each set of indexed rows to CSV and print to file.

So that I end up with a file with one row for idx 0, two rows for idx 1, and two rows for idx 2.

Like so:

file1
col1,col2,col3,col4
A,B,C,D

file2
col1,col2,col3,col4
E,F,G,H
I,J,K,L

file3
col1,col2,col3,col4
M,N,O,P
Q,R,S,T

I have this code, but it only gives me the first row of each index set:

for i, dfr in Template.TEMPLATE_DF.iterrows(): 
        fpath = path + '\\' + dfr['tmpl.title'].lower().replace(' ', '_') + '_' + str(dfr['tmpl.id']) + '.csv'
        dfr=pd.DataFrame(data=dfr).transpose()
        dfr.to_csv(fpath, sep=',', encoding='utf-8', na_rep='NULL', index=False)

What am I missing here?

A.G.
  • 2,089
  • 3
  • 30
  • 52
  • 1
    This might be helpful [Split pandas dataframe based on groupby](https://stackoverflow.com/questions/23691133/split-pandas-dataframe-based-on-groupby) – Henry Ecker May 29 '21 at 16:39
  • `dfs = [x for _, x in df.groupby('idx')]` The write out each `df` in `dfs`. – Henry Ecker May 29 '21 at 16:39
  • Add code to initialize the dataframe - make this a fully running program. Then we can copy / change / paste a working solution. – tdelaney May 29 '21 at 16:42
  • @Henry Ecker: Thanks for your reply. But I am still just getting one row. I definitely have multiple rows that should be there. This is what I did: `dfs = [x for _, x in dfr.groupby('idx')] for df in dfs: df.to_csv(...)` – A.G. May 29 '21 at 17:06
  • No you should've grouped the source DF. `dfs = [x for _, x in Template.TEMPLATE_DF.groupby('idx')]` – Henry Ecker May 29 '21 at 17:08

1 Answers1

3

this will send each grouping to the function, where it should be written to a file. check the fpath though this changes because you are no longer sending a row via iterrows, but a slice of the dataframe, so I used [0] to take the first row of x, but like I said, not sure it works because it's not test.

data='''idx tmpl.id tmpl.title  col3    col4
0   10  title one   C   D
1   20  title two   G   H
1   30  title three K   L
2   40  title four  O   P
2   50  title five  S   T'''

df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
def write_csv(x):
    # print(x['tmpl.title'].iat[0])
    fpath = path + '\\' + x['tmpl.title'].iat[0].lower().replace(' ', '_') + '_' + str(x['tmpl.id'].iat[0]) + '.csv' # this probably isn't correct
    x.transpose()
    x.to_csv(fpath, sep=',', encoding='utf-8', na_rep='NULL', index=False)

# df.groupby(df.index).apply(write_csv)
df.groupby('idx').apply(write_csv)

if idx is not your index, then use

df.groupby('idx').apply(write_csv)
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14
  • Thank you for your reply! This works! BUT, the PATH part isn't working. The code as is will work for the first two groups but on the third one it fails with a index error. I added a try/except and set the index to 1 on the except then it gives a Series error. Tried different combos, no go. I need the path to work. I tried getting the first row of each group like this: grp.nth(0) then I thought Id get the proper attribs, but didnt work either. – A.G. May 29 '21 at 18:03
  • Add your exact columns because I can’t help if your example and your actual data don’t match. – Jonathan Leon May 29 '21 at 18:09
  • If by that you mean column names, then here they are: `'tmpl.builtin_name', 'tmpl.id', 'tmpl.title', 'flds.id', 'flds.field_type', 'flds.tooltip_text', 'flds.name_plural', 'flds.name_singular', 'flds.backref_name', 'flds.backref_tooltip_text', 'flds.allow_multiple', 'flds.allowed_otypes', 'flds.options','flds.builtin_name'`. Grouping by `Template.TEMPLATE_DF.index` – A.G. May 29 '21 at 18:56
  • Here's how to do it: id = str(x['tmpl.id'].unique()[0:1][0]) ttl = str(x['tmpl.title'].unique()[0:1][0]) fpath = path + '\\' + ttl.lower().replace(' ', '_') + '_' + id + '.csv' – A.G. May 29 '21 at 22:18
  • 1
    updated my code with your column titles. i left out the .iat – Jonathan Leon May 29 '21 at 22:21