So I have 2 dataframes both of which have the same index and columns. I modified the first one, and I am looking to somehow index out the secound one so that it is now the length of the first one. The code of the first one looks like:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'a': ['.81', '1.2', '.67', '.78'],
'b': ['.2', '-.9', '.7', '.89'],
'c': ['.3', '.22', '.4', '.98'],
'd': ['.5', '.45', '.34', '.92']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'a': ['1', '2', '3', '4'],
'b': ['9', '7', '6', '5'],
'c': ['1', '14', '9', '5'],
'd': ['3', '12', '2', '34']},
index=[0, 1, 2, 3])
count=0
for i in df1.index:
d = pd.DataFrame()
d = df1.iloc[[count]]
count = count+1
d = d.T
d.columns = ['Dates']
try:
d.sort_values(by=['Dates'], inplace=True)
except KeyError:
print(KeyError)
d.dropna(inplace=True)
d['Dates'] = d['Dates'][:10]
print(d)
count = 0
for y in df2.index:
df = pd.DataFrame()
df = df2.iloc[[count]]
count = count+1
df = df.T
df.columns = ['Dates']
df.dropna(inplace=True)
print(df)
The Df1 for loop has an output of:
Dates
b .2
c .3
d .5
a .81
Dates
b -.9
c .22
d .45
a 1.2
Dates
d .34
c .4
a .67
b .7
Dates
a .78
b .89
d .92
c .98
Df2 has an output of:
Dates
a 1
b 9
c 1
d 3
Dates
a 2
b 7
c 14
d 12
Dates
a 3
b 6
c 9
d 2
Dates
a 4
b 5
c 5
d 34
As the code shows, I am organizing the rows by smallest to largest and then indexing out the first 10 rows of the index. I understand that in this code many of the rows are not more than 10 values, but the reason I need to index like this is for when I use a much larger dataset where this will be an issue. So now I would like to index out values in the second dataframe using the index of the first dataframe. So now say that if the second dataframe has 30 rows, it now only has 10, and it based on the 10 from the first dataframe.
edit: The problem is this: for i in df1.index: #create 10 new dataframes d = pd.DataFrame() d = df1.iloc[[count]] count = count+1 So now what this code is doing is creating 10 new dataframes based on a larger dataframe. So then if I were to use .loc in the secound for loop it would look like: for i in df2.index: #create 10 new dataframes df = pd.DataFrame() df = df2.iloc[[count]] count = count+1 So then if in this secound for loop you would say d.loc, this would be an issue because d is equal to the last dataframe created in the for loop rather than all of them.