1

I have two dataframe df1 and df2,

df1

A B
2 6
5 1
7 3
1 2
9 7
4 7
3 4
8 9

and df2 contains

A  B  A_bin  B_bin  C  D  E
2  6  1      2      5  4  1
5  1  2      1      2  2  4
7  3  3      1      5  1  7
1  2  1      1      8  4  9
9  7  3      3      5  5  8
4  7  2      3      1  8  5
3  4  1      2      2  9  3
8  9  3      3      4  6  2

I am trying to select only those specific rows selected from df2 to a new data frame df_result_A for all the row that has A_bin = 1 similarily, a separate data frame df_result_B having all those rows of df2 such that B_bin rows contain 1. I am finding it difficult to put my logic incorrect syntax or probably my logic is wrong,

for i in range(len(df1(df2[columns])+len(df)):
    if(row value is 1)
print in df_result_A
print in df_result_B

As the challenge is to not use column name and indexing, as the code should run for other data set as well I am trying to first iterate over the first two column of df2 as len(df1) will let me know that after 2 columns A_bin and B_bin will come. thus, when I am on the first column of df2 then adding len(df1) will put me on A_bin and iterating over it for checking value to be 1 and storing it in a separate dataframe. Similarly, when I am on 2nd column of df2 adding len(df2) will put me on B_bin and thus storing its result in df_result_B. expected result in separate dataframe.

df_result_A

A  B   C  D  E
2  6   5  4  1
1  2   8  4  9
3  4   2  9  3

df_result_b

A B C D E
5 1 2 2 4
7 3 5 1 7
1 2 8 4 9
akilesh raj
  • 656
  • 1
  • 8
  • 19
phoenix4
  • 59
  • 9
  • Possible duplicate of [Select rows from a DataFrame based on values in a column in pandas](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – Bollehenk Oct 30 '18 at 10:54
  • please read the question, i cannot use column name or indexing (i have mentioned it in bold letters) @Bollehenk – phoenix4 Oct 30 '18 at 10:58

1 Answers1

2

You can do something like this:

Sample dataframes:

In [31]: df1
Out[31]: 
   A  B
0  2  6
1  5  1
2  7  3
3  1  2
4  9  7
5  4  7
6  3  4
7  8  9

In [36]: df2
Out[36]: 
   A  B  A_bin  B_bin  C  D  E
0  2  6      1      2  5  4  1
1  5  1      2      1  2  2  4
2  7  3      3      1  5  1  7
3  1  2      1      1  8  4  9
4  9  7      3      3  5  5  8
5  4  7      2      3  1  8  5
6  3  4      1      2  2  9  3
7  8  9      3      3  4  6  2

Have a variable count and an empty dictionary to store new dataframes on the fly.

count = 0
d = dict()

I've used your logic to pick every 3rd column based on the length of columns of df1.

for col in df2.columns:
    print(col)
    l = df1.shape[1]
    if count < l:
        d[col] = df2[ df2.iloc[:, count + l] == 1 ]
    count += 1

This loops over columns of df2 until the variable count is less than number of columns of df1. It filters the rows of 3rd and 4th cols(as per your example) with value of 1 and stores in the dictionary d.

Now, you can loop over your dictionary and find the new dataframes you wanted:

In [52]: for key in d.keys():
    ...:     print(d[key][d[key].columns.drop(list(d[key].filter(regex='bin')))])

   A  B  C  D  E
0  2  6  5  4  1
3  1  2  8  4  9
6  3  4  2  9  3


   A  B  C  D  E
1  5  1  2  2  4
2  7  3  5  1  7
3  1  2  8  4  9

So, you don't have to worry about the exact column names. Let me know if this helps.

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • thanks, it gave the desired output but i need to store the output in different dataframe each time we iterate through bin columns (A_bin, B_bin, C_bin and so on), suppose if we had more bin column in different dataframe the result should be fetched in different dataframe everytime for each bin when we check for the condition in bin column @Mayank Porwal – phoenix4 Oct 30 '18 at 13:11
  • 1
    Yes, even in my solution result is getting stored in a different `dataframe` in dictionary. You can access each dataframe by looping over the `dictionary d` as I showed in last command. Just try and print every key of the dictionary like `for key in d.keys(): print(d[key])`. This will output a separate df each time. – Mayank Porwal Oct 30 '18 at 13:14
  • suppose if i want to retrieve dataframe which was result of condition on B_bin, how can i retrieve that particular dataframe from dictionary without mentioning the column name. Is there a way that each time it checks for condition on bin columns the output that stored in some dataframe with standard name for example: df_output1, df_output2 where output number depends upon the bin columns in dataframe. @Mayank Porwal – phoenix4 Oct 30 '18 at 13:29
  • 1
    The number of dataframe stored in dicitionary will be equal to the number of Bin columns in `df2`. They will be stored in dict in the same order they occur in `df2`. So, 1st df in dict will have data for `A_bin`, 2nd will have data for `B_bin` and so on. The point here is, the loop runs on the whole `df2`, and when the loop finishes, it will return all dataframes for bin columns at once. – Mayank Porwal Oct 30 '18 at 13:34
  • so it is possible to retrieve a particular dataframe from dictionary @Mayank Porwal – phoenix4 Oct 30 '18 at 13:36
  • Yes you can, by the key name. So, df['A'] will have `A_bins` data. df['B'] will have `B_bins` data and so on. – Mayank Porwal Oct 30 '18 at 13:39
  • is it possible to retrieve without giving column name @Mayank Porwal – phoenix4 Oct 30 '18 at 13:43
  • No. But, I am so lost right now. You always know what you have in `df1`(the one with only `A and B`) based on which the above logic stands. So, you know parent column names. – Mayank Porwal Oct 30 '18 at 13:45
  • i am trying to make a template code that can run on any dataframe that can have any number of column and column names might differ ,so i am trying to make the code in which i just have to change the file name and it does the rest of the work without modifying any parameter or column name @Mayank Porwal – phoenix4 Oct 30 '18 at 13:48
  • Yes, my code handles all that you want. You tell me, how will you map the new generated dataframes to your original columns, which dataframe belongs to which column.? – Mayank Porwal Oct 30 '18 at 14:00
  • i want to compare the data (min max) from column 'A' with bin value=7 compare to column 'B' with bin value=7 @Mayank Porwal – phoenix4 Oct 31 '18 at 04:29
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182865/discussion-between-phoenix4-and-mayank-porwal). – phoenix4 Oct 31 '18 at 13:51