Selecting rows with the highest value based on 1 column in the dataframe

Question

I have a set of dataframe with about 20k rows. with headings X,Y,Z,I,R,G,B. ( yes its point cloud)

I would wanna create numerous sub dataframes by grouping the data in rows of 100 after sorting out according to column X. Subsequently i would like to sort all sub dataframes according to Y column and breaking them down further into rows of 50. (breaking each sub dataframe down further) The end result is I should have a group of sub dataframes in rows of 50, and i would like to pick out all the rows with the highest Z value in each sub dataframe and write them onto a CSV file.

I have reached the following method with my code. But i am not sure how to continue further.

import pandas as pd
headings = ['x', 'y', 'z']
data = pd.read_table('file.csv', sep=',', skiprows=[0], names=headings)

points = data.sort_values(by=['x'])

how about slicing the dataframe with iteration https://stackoverflow.com/questions/47337328/pandas-dataframe-slicing-with-iteration — Poonam Adhav, Jan 06 '19 at 17:20

score 0 · Answer 1 · answered Jan 06 '19 at 18:40

Considering a dummy dataframe of 1000 rows,

df.head()   # first 5 rows

    X   Y   Z   I   R   G   B
0   6   6   0   3   7   0   2
1   0   8   3   6   5   9   7
2   8   9   7   3   0   4   5
3   9   6   8   5   1   0   0
4   9   0   3   0   9   2   9

First, extract the highest value of Z from the dataframe,

z_max = df['Z'].max()
df = df.sort_values('X')

# list of dataframes
dfs_X = np.split(df, len(df)/ 100)

results = pd.DataFrame()
for idx, df_x in enumerate(dfs_X):
    dfs_X[idx] = df_x.sort_values('Y')
    dfs_Y = np.split(dfs_X[idx], len(dfs_X[idx]) / 50)
    for idy, df_y in enumerate(dfs_Y):
        rows = df_y[df_y['Z'] == z_max]
        results = results.append(rows)
results.head()

results will contain rows from all dataframes which have highest value of Z.

Output: First 5 rows

    X   Y   Z   I   R   G   B
541 0   0   9   0   3   6   2
610 0   2   9   3   0   7   6
133 0   4   9   3   3   9   9
731 0   5   9   5   1   0   2
629 0   5   9   0   9   7   7

Now, write this dataframe to csv using df.to_csv().

Selecting rows with the highest value based on 1 column in the dataframe

1 Answers1