1

So I have 2 dataframes both of which have the same index and columns. I modified the first one, and I am looking to somehow index out the secound one so that it is now the length of the first one. The code of the first one looks like:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'a': ['.81', '1.2', '.67', '.78'],
                   'b': ['.2', '-.9', '.7', '.89'],
                   'c': ['.3', '.22', '.4', '.98'],
                   'd': ['.5', '.45', '.34', '.92']},
                   index=[0, 1, 2, 3])
df2 = pd.DataFrame({'a': ['1', '2', '3', '4'],
                   'b': ['9', '7', '6', '5'],
                   'c': ['1', '14', '9', '5'],
                   'd': ['3', '12', '2', '34']},
                   index=[0, 1, 2, 3])
count=0

for i in df1.index:
    d = pd.DataFrame()
    d = df1.iloc[[count]]
    count = count+1
    d = d.T
    d.columns = ['Dates'] 
    try:
        d.sort_values(by=['Dates'], inplace=True)
    except KeyError:
        print(KeyError)
    d.dropna(inplace=True)
    d['Dates'] = d['Dates'][:10]
    print(d)
count = 0
for y in df2.index:
    df = pd.DataFrame()
    df = df2.iloc[[count]]
    count = count+1
    df = df.T
    df.columns = ['Dates'] 
    df.dropna(inplace=True)
    print(df)

The Df1 for loop has an output of:

     Dates
b    .2
c    .3
d    .5
a   .81
  Dates
b   -.9
c   .22
d   .45
a   1.2
  Dates
d   .34
c    .4
a   .67
b    .7
  Dates
a   .78
b   .89
d   .92
c   .98

Df2 has an output of:

   Dates
a     1
b     9
c     1
d     3
  Dates
a     2
b     7
c    14
d    12
  Dates
a     3
b     6
c     9
d     2
  Dates
a     4
b     5
c     5
d    34

As the code shows, I am organizing the rows by smallest to largest and then indexing out the first 10 rows of the index. I understand that in this code many of the rows are not more than 10 values, but the reason I need to index like this is for when I use a much larger dataset where this will be an issue. So now I would like to index out values in the second dataframe using the index of the first dataframe. So now say that if the second dataframe has 30 rows, it now only has 10, and it based on the 10 from the first dataframe.

edit: The problem is this: for i in df1.index: #create 10 new dataframes d = pd.DataFrame() d = df1.iloc[[count]] count = count+1 So now what this code is doing is creating 10 new dataframes based on a larger dataframe. So then if I were to use .loc in the secound for loop it would look like: for i in df2.index: #create 10 new dataframes df = pd.DataFrame() df = df2.iloc[[count]] count = count+1 So then if in this secound for loop you would say d.loc, this would be an issue because d is equal to the last dataframe created in the for loop rather than all of them.

benito.cano
  • 797
  • 1
  • 8
  • 23
  • Please produce a Minimal Reproducible Example; it'll be easier to help you if you do so! – zabop Jan 06 '21 at 22:43
  • Thank you for responding @zabop I can try to do this by creating the output for the 2 for loops less. I treid to minimize this already from main code, and I have tried to do it on another smaller dataset but I cant find a way to do it correctly. Woudl creating a smaller output help you with the problem? – benito.cano Jan 06 '21 at 22:49
  • Yeah, create a df something like in this question: https://stackoverflow.com/q/63583502/8565438 :) – zabop Jan 06 '21 at 22:51
  • 1
    Okay will do, thank you – benito.cano Jan 06 '21 at 22:52
  • Just found now, also useful: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – zabop Jan 06 '21 at 23:24
  • @zabop Okay so I added a sample dataframe, does this help or is the question still confusing? – benito.cano Jan 07 '21 at 02:23
  • Yeah much better now. I still don't get what you mean by "somehow index out the secound one so that it is now the length of the first one" exactly though, it might be just me. – zabop Jan 07 '21 at 13:11
  • Yes okay so to clarify that, what I am saying is the first and secound dataframe have the same index. The first dataframe was sorted, and therefore it has now become smaller. So now that the first dataframe is smaller, I need the secound dataframe to drop the indexes that now the first one does not have after it was modified. I dont know if this helps, I can try to explain it in a different manner. – benito.cano Jan 07 '21 at 13:43
  • Added an answer now, hope it helps. – zabop Jan 07 '21 at 14:12

1 Answers1

0

Using your example dataframes, let's say df1 gets smaller by drop()ping the row with index=2:

df1.drop(2,inplace=True)

df1 will be now:

     a    b    c    d
0  .81   .2   .3   .5
1  1.2  -.9  .22  .45
3  .78  .89  .98  .92

You can modify the indices in whatever way you wish. Then, to select the rows from df2 which are present in df1, can do:

df2.loc[df1.index]

giving you:

   a  b   c   d
0  1  9   1   3
1  2  7  14  12
3  4  5   5  34

If only need some columns, let's say c & d:

df2.loc[df1.index,['c','d']]

giving you:

    c   d
0   1   3
1  14  12
3   5  34
zabop
  • 6,750
  • 3
  • 39
  • 84
  • Yes this does help though the only issue is that df1 and df2 are dataframes in a for loop, so then if I were to try to use .loc in the second for loop I would have a problem because df1 is the value of one dataframe not all of the dataframes created by the first for loop – benito.cano Jan 07 '21 at 14:23
  • I dont know if I explained my self well there but, I will add a better explanation to this problem as the edit. – benito.cano Jan 07 '21 at 14:24
  • So I think there would be solutions, one is a solution to what I added to the question in the edit section, and 2 would be to find a way write the same for loop code though without for loops. – benito.cano Jan 07 '21 at 14:31