How do I split a single dataframe into multiple dataframes by the range of a column value?

Question

First off, I realize that this question has been asked a ton of times in many different forms, but a lot of the answers just give code that solves the problem without explaining what the code actually does or why it works.

I have an enormous data set of phone numbers and area codes that I have loaded into a dataframe in python to do some processing with. Before I do that processing, I need to split the single dataframe into multiple dataframes that contain phone numbers in certain ranges of area codes that I can then do more processing on. For example:

+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 5501231234   | 550       |
+---+--------------+-----------+
| 2 | 5051231234   | 505       |
+---+--------------+-----------+
| 3 | 5001231234   | 500       |
+---+--------------+-----------+
| 4 | 6201231234   | 620       |
+---+--------------+-----------+

into

area-codes (500-550)
+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 5501231234   | 550       |
+---+--------------+-----------+
| 2 | 5051231234   | 505       |
+---+--------------+-----------+
| 3 | 5001231234   | 500       |
+---+--------------+-----------+

and

area-codes (600-650)
+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 6201231234   | 620       |
+---+--------------+-----------+

I get that this should be possible using pandas (specifically groupby and a Series object I think) but the documentation and examples on the internet I could find were a little too nebulous or sparse for me to follow. Maybe there's a better way to do this than the way I'm trying to do it?

https://stackoverflow.com/a/33742822/11610186 – Grzegorz Skibinski Aug 26 '19 at 13:34 — Grzegorz Skibinski, Aug 26 '19 at 13:34

score 2 · Accepted Answer · answered Aug 26 '19 at 13:32

2

You can use pd.cut to bin the area column , then use the labels to group the data and store in a dictionary. Finally print each key to see the dataframe:

bins=[500,550,600,650]
labels=['500-550','550-600','600-650']

d={f'area_code_{i}':g for i,g in 
  df.groupby(pd.cut(df.area_code,bins,include_lowest=True,labels=labels))}

print(d['area_code_500-550'])
print('\n')
print(d['area_code_600-650'])

    phone_number  area_code
0    5501231234        550
1    5051231234        505
2    5001231234        500


   phone_number  area_code
3    6201231234        620

answered Aug 26 '19 at 13:32

anky

74,114
11
41
70

Incredibly simple and efficient. Worked like a charm and I also understand what's happening haha, thank you! – Tom Aug 26 '19 at 14:57
@Tom Glad I could help. happy coding :) – anky Aug 26 '19 at 15:14

score 0 · Answer 2 · edited Aug 26 '19 at 16:00

You can also do this by select rows in dataframe by chaining multiple condition with & or | operator

df1 select rows with area_code between 500-550
df2 select rows with area_code between 600-650

df = pd.DataFrame({'phone_number':[5501231234, 5051231234, 5001231234 ,6201231234],
                   'area_code':[550,505,500,620]}, 
                    columns=['phone_number', 'area_code'])
df1 = df[ (df['area_code']>=500) & (df['area_code']<=550) ]
df2 = df[ (df['area_code']>=600) & (df['area_code']<=650) ]

df1
phone_number  area_code
0    5501231234        550
1    5051231234        505
2    5001231234        500

df2
phone_number  area_code
3    6201231234        620

How do I split a single dataframe into multiple dataframes by the range of a column value?

2 Answers2