0

I have this dataframe:

data = {'X_1': [1, 10, 2, 5, 0], 'X_2': [11, 30, 16, 31, 31], 'X_3': ['Yes', 'No', 'No', 'Yes', 'No']}  
df = pd.DataFrame(data)  
print(df) 

   X_1  X_2  X_3
0    1   11  Yes
1   10   30   No
2    2   16   No
3    5   31  Yes
4    0   31   No

I would like to create a function that receives as a parameter the dataframe and returns two xlsx files. One with the rows in which values for the variable X_3 are equal to 'No' and the other with the rows in which values for X_3 are equal to 'Yes'.

I was trying to do that this way, but I don't know how to create the function that returns the two files that I need.

def preprocess (df):
    for i in X_3:
        if df.loc[data['X_3'] != 'Yes']:
            return data, data_2

What I expect as the return xlsx files is this:

xlsx file #1:

   X_1  X_2  X_3
1   10   30   No
2    2   16   No
4    0   31   No

xlsx file #2:

   X_1  X_2  X_3
0    1   11  Yes
3    5   31  Yes

Thank you!

2 Answers2

0

Using any one of the good methods of this page, 3 very good methods there. :

import pandas as pd
data = {'X_1': [1, 10, 2, 5, 0], 'X_2': [11, 30, 16, 31, 31], 'X_3': ['Yes', 'No', 'No', 'Yes', 'No']}
df = pd.DataFrame(data)
#print(df)

def mysplit (a):
    df1 = df[df['X_3'] == 'Yes']
    df2 = df[df['X_3'] == 'No']
    print(df1,file=open("file_a.xls", "a", encoding="utf-8"))
    print(df2,file=open("file_b.xls", "a", encoding="utf-8"))
    return
mysplit(df)

I know you said xlsx, but the principle will also work the same with excel writer too.

David Wooley - AST
  • 346
  • 2
  • 4
  • 13
  • Thank you for your useful answer! Although I have another question. Let's say I have 3 categories in column X_3 ('Yes', 'No, and 'Maybe'). How can I store two categories in one of the xlsx files and the remaining one in the other. Appreciate your help. Thank you again :) – Patricia24 Apr 24 '21 at 03:53
  • a. i think there are 1001 ways to do that, b, its 05:35am in gmt london. c. [link](https://www.geeksforgeeks.org/python-pandas-merging-joining-and-concatenating/) d, .def split3 (a): df1 = df[df['X_3'] == 'Yes'] df2 = df[df['X_3'] == 'No'] dfm =df[df['X_3'] == 'maybe'] frames = [df1, dfm] res1 = pd.concat(frames) print(res1,file=open("file34a.txt", "a", encoding="utf-8")) print(df2,file=open("file33b.txt", "a", encoding="utf-8")) return split3(df) e. 2nd seperate or follow up questions arnt allowed on SO! – David Wooley - AST Apr 24 '21 at 04:41
  • see also: https://stackoverflow.com/questions/19790790/splitting-dataframe-into-multiple-dataframes . Though I am sure there alot more versitlle able and more awake people here who could read & answer your questiin alot better than me, (Though having said that, 2nd aditional questions are usually disaproved , disalowed So maybe if you want that 2nd questioed answered better and more attentiom given to it, write-post it as a seperte question in a new question thread. Youll probabkly get better answers!). Shattered. off to sleep – David Wooley - AST Apr 24 '21 at 04:53
0

Patricia,

For boolean indexing in pandas, or querying, I've learned a lot here in last 24, (but not this. This was done in 1 hour since coming home from an important day out).

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

dataquery= df[(df['X_3'] == 'Yes') | (df['X_3'] == 'Maybe')]

use & (for "and") and | (for "or") and make sure the "(" & ")" are well arranged for the boolean logic to work, which im sure youve worked out by now without me, but just posting in for completenes.

so full "fun - test code" for nb's like you and I is:

import pandas as pd
data = {'X_1': [1, 10, 2, 5, 0], 'X_2': [11, 30, 16, 31, 31], 'X_3': ['Yes', 'Maybe', 'No', 'Yes', 'No']}
df = pd.DataFrame(data)
print("Hello start")
print(df)
def mysplit4 (dataframe):

    print("hello 1")
    #def mysplit (a):
    df1 = df[df['X_3'] == 'Yes']
    df2 = df[df['X_3'] == 'No']
    dataquery= df[(df['X_3'] == 'Yes') | (df['X_3'] == 'Maybe')]
    print(df1)
    print("hello 2")
    print(df2)
    print("hello 3")
    print(dataquery)
    print("hello 4")
    return

mysplit4(df)

ref0

ref1 <--- all good, but this was the Tick/coin drop for me :)

ref2 (this even looks even better explination, which i never refered to until finishing the code)

Last word of warning: No one on so is supposed answer or try to answer questions like I do, or done so, (i think) - too personable/sociable/'wafly' with irrelent information they say (and some may well be irrelevent, indeed, or it might be helpful, idk) so I may get mark down because of that if someone else sees my answer (idk).

But I wanted or let you know about and how to employ and use booleans the way you wanted to (and asked) , and not need groupby which I too personally do not favour. So the above code imo is better . (but you can use groupby & booleans combined, up to you)

last thing, as @edchum in ref0 (see below), and others say, it is inefficient to try iterating over a dframe, because that's already done for you and inherent in pandas.

(I'm not the first to say and repeating what I've learned) but intuitively I already knew that. See this interesting redit post for eg.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
David Wooley - AST
  • 346
  • 2
  • 4
  • 13
  • Thank you Patricia. It was for my benefit as well. ! & my pleasure.....! If it has been helpful or answered your question could you consider doing one of these actions? up to u - [what to do when someone answers question on so](https://stackoverflow.com/help/someone-answers). But above all that, and regardless, it was my pleasure . I am kind of seeking a trusted collaborator/chum/partner while im at work for stuff like this, so it might be useful to take my email down in case of similar needs / question in future or you dont & cant get the answers here or elsewhere: davidwooley78@gmail.com – David Wooley - AST Apr 25 '21 at 18:43
  • **"Accepting an answer is not mandatory; do not feel compelled to accept the first answer you receive. Wait until you receive an answer that answers your question well."** if it does. I feel that others are/were better placed and could have answered your question far more better, competently and with useful utility/real expert experience then i did, But as you yourself get more markups/points here, any future questions you may ask (which will get more complex and be more interest to the experts here) will be taken more notice of, acted upon and be seen. Solidarity/tke care, S. – David Wooley - AST Apr 25 '21 at 18:54