Map counts of a numerical column from a new DataFrame to the bin range column of training data

Question

I am trying to get the count of Age column and append it to my existing bin-range column created. I am able to do it for the training df and want to do it for prediction data. How do I map values of count of Age column from prediction data to to Age_bin column in my training data? The first one is my output DF whereas the 2nd one is the sample DF. I can get the count using value_counts() for the file I am reading.

First image - bin and count from training data
Second image - Training data
Third image - Prediction data
Fourth image - Final output

.

Will you please add a sample dataframe to you question that shows the output you're try to get? — , Nov 26 '21 at 23:46
The attached picture is the output of my current CSV file. Going forward I would want the `Age_count` to give me counts from a new CSV file / data frame — self.Fool, Nov 26 '21 at 23:49
As it is, I don't have enough information to help you. Please provide samples of all dataframe(s) you're working with, and a sample dataframe containing your expected output. Thank you :) — , Nov 26 '21 at 23:51
Is it clear now? The first DF is my desired output but the count column is taken from other dataframe (attached below) — self.Fool, Nov 27 '21 at 00:04
So are you trying to map Salary column in the 2nd df to the Age_bin column of the 1st df?\ — , Nov 27 '21 at 00:05
No, map the count of 2nd df `Age` column to `Age_bin` column. — self.Fool, Nov 27 '21 at 00:09
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239621/discussion-between-self-fool-and-user17242583). — self.Fool, Nov 27 '21 at 00:10

score 0 · Answer 1 · answered Nov 27 '21 at 00:29

0

The Data

import pandas as pd 

data = { 
     0: 0, 
     11: 1500, 
     12: 1000, 
     22: 3000, 
     32: 35000,
     34: 40000, 
     44: 55000, 
     65: 7000, 
     80: 8000, 
     100: 1000000, 
}

df = pd.DataFrame(data.items(), columns=['Age', 'Salary'])

   Age   Salary
0    0        0
1   11     1500
2   12     1000
3   22     3000
4   32    35000
5   34    40000
6   44    55000
7   65     7000
8   80     8000
9  100  1000000

The Code

bins = [-0.1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# create a "binned" column 
df['binned'] = pd.cut(df['Age'], bins)

# add bin count
df['count'] = df.groupby('binned')['binned'].transform('count')

The Output

   Age   Salary         binned  count
0    0        0   (-0.1, 10.0]      1
1   11     1500   (10.0, 20.0]      2
2   12     1000   (10.0, 20.0]      2
3   22     3000   (20.0, 30.0]      1
4   32    35000   (30.0, 40.0]      2
5   34    40000   (30.0, 40.0]      2
6   44    55000   (40.0, 50.0]      1
7   65     7000   (60.0, 70.0]      1
8   80     8000   (70.0, 80.0]      1
9  100  1000000  (90.0, 100.0]      1

answered Nov 27 '21 at 00:29

Sy Ker

2,047
1
4
20

I am fine with the bins part. As per my understanding of the above gives the count of binned column count whereas I would like to have the count of Age column from a new csv which has both Age and Salary. – self.Fool Nov 27 '21 at 00:37
I'm not sure I understand your problem. So you want to count how many 22 year old there are in one dataframe and add this to another dataframe containing only the age bin 20-30? (for example, I get that there are more bins and more ages) – Sy Ker Nov 27 '21 at 00:42
I have created Age_bins and Age_count column from my training dataset (which has columns Age & Salary). Now I want to read the Age_count from a new dataset (which has both Age and salary columns) and append Age_count to my existing Age_bins. Can you join the chat if required? – self.Fool Nov 27 '21 at 01:22
I think this is what you need: https://stackoverflow.com/questions/23361218/pandas-dataframe-merge-summing-column . Essentialy, you create two dataframes with the same columns. Than you "sum-merge" the two dataframes on the "binned" column. Sorry, it's very late for me. Hope this helps. – Sy Ker Nov 27 '21 at 01:39

Map counts of a numerical column from a new DataFrame to the bin range column of training data

1 Answers1

The Data

The Code

The Output