0

Hi i'm a python beginner. I would like to extract different csv files from one big csv based on a value within a column. My csv looks like this:

IDMember    Risk    Area
1             2       30
2             1       20
3             2       10

I have thousands of value with the same IDMember and i would like to extract one csv for each one of this ID, with the ID within the csv filename. How can i do it?

i'm trying something like this, but i think i'm missing something:

import os
import pandas

df = r'MyCSVPath'
outFolder = r'MyOutputFolder'

col = 'MemberID'

for col in df.groupby('MemberID'):
    df.to_csv(f'{MemberID}.csv')
Carl
  • 33
  • 1
  • 6
  • Does this answer your question? [How to select rows from a DataFrame based on column values?](https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values) – Daemon Painter Mar 18 '20 at 16:27
  • `for i in df['MemberID'].unique(): df[df['MemberID'].eq(i)].to_csv(....)` – anky Mar 18 '20 at 16:30
  • @DaemonPainter: I do not think that this one really addresses OP's question. `groupby` is probably more on topic... – Serge Ballesta Mar 18 '20 at 16:30
  • There are different problems here: 1/ open a csv file and read a csv file into a pandas DataFrame: `open` and `read_csv` 2/ split a csv according to values in one column: `groupby` and iterating on a GroupBy 3/ open a new file inside a folder: `os.path` and again `open` 4/ save a dataframe to a csv file `to_csv`. Maybe you should try to focus one question at a time... I think that reading [ask] could be useful :-) – Serge Ballesta Mar 18 '20 at 16:39

1 Answers1

0

Suppose we are having the dataframe where id's are duplicated.

   IDMember  Risk  Area
0         1     2    30
1         2     1    20
2         3     2    10
3         4     2    15
4         1     2    45
5         2     2    40
6         4     3    50
7         1     3    30


script:


import os
import pandas as pd

df = pd.read_csv('csvFile.csv') #reading the csv file

df1 = df['IDMember'].drop_duplicates(keep='first', inplace=False) #creating dataframe showing unique ids

for elem in df1:  #iterating through each id
  df2 = df.loc[df['IDMember'] == elem]
  print(df2)
  df2.to_csv('IDmember'+str(elem)+'.csv')  #writing to separate csv by each id 
stanna
  • 98
  • 6
  • Thanks, actually is creating all the csv files i need, but they are all the same, only the filename is different. – Carl Mar 19 '20 at 14:41
  • I changed the last row with `df1` instead of `df`, and the content of files is different as i wanted, but there is only the column related to `IDMember` – Carl Mar 19 '20 at 14:51
  • I am sorry, I have mistakingly put print() there. I have edited the post - now it should work. – stanna Mar 19 '20 at 17:20
  • It was not working, it was just giving me empty csv with headers. I Changed the last two rows , with `df1 =`... and `df1.to_csv =`... – Carl Apr 15 '20 at 09:04
  • I have changed name of the dataframe df to df2 - the problem was that df was changed after first iteration and csv files were empty. Hope now it will finally work. – stanna Apr 16 '20 at 10:28