Questions tagged [binning]

binning is the process of grouping data into "bins" used in statistics and data analysis

Binning is the process of grouping data into "bins" used in statistics and data analysis. For details see also Data binning - Wikipedia, the free encyclopedia

684 questions
226
votes
10 answers

Histogram using gnuplot?

I know how to create a histogram (just use "with boxes") in gnuplot if my .dat file already has properly binned data. Is there a way to take a list of numbers and have gnuplot provide a histogram based on ranges and bin sizes the user provides?
mary
  • 2,577
  • 5
  • 19
  • 11
192
votes
4 answers

Binning a column with pandas

I have a data frame column with numeric values: df['percentage'].head() 46.5 44.2 100.0 42.12 I want to see the column as bin counts: bins = [0, 1, 5, 10, 25, 50, 100] How can I get the result as bins with their value counts? [0, 1] bin amount [1,…
Night Walker
  • 20,638
  • 52
  • 151
  • 228
127
votes
6 answers

Pandas: convert categories to numbers

Suppose I have a dataframe with countries that goes as: cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0 I know that there is a pd.get_dummies function to convert the countries to 'one-hot encodings'. However, I wish to convert them to indices…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
92
votes
10 answers

Getting data for histogram plot

Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query: select total, count(total) from faults GROUP BY total; The data that is being generated is good enough but there are just too many rows. What I need is a…
Legend
  • 113,822
  • 119
  • 272
  • 400
48
votes
3 answers

Is cut() style binning available in dplyr?

Is there a way to do something like a cut() function for binning numeric values in a dplyr table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply…
Michael Williams
  • 1,125
  • 2
  • 9
  • 13
40
votes
5 answers

resize with averaging or rebin a numpy 2d array

I am trying to reimplement in python an IDL function: http://star.pst.qub.ac.uk/idl/REBIN.html which downsizes by an integer factor a 2d array by averaging. For example: >>> a=np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], …
Andrea Zonca
  • 8,378
  • 9
  • 42
  • 70
35
votes
4 answers

Define and apply custom bins on a dataframe

Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000…
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
32
votes
4 answers

Categorize numeric variable into group/ bins/ breaks

I am trying to categorize a numeric variable (age) into groups defined by intervals so it will not be continuous. I have this code: data$agegrp(data$age >= 40 & data$age <= 49) <- 3 data$agegrp(data$age >= 30 & data$age <= 39) <-…
leian
  • 443
  • 2
  • 5
  • 5
26
votes
3 answers

Bin pandas dataframe by every X rows

I have a simple dataframe which I would like to bin for every 3 rows. It looks like this: col1 0 2 1 1 2 3 3 1 4 0 and I would like to turn it into this: col1 0 2 1 0.5 I have already posted a similar…
TheChymera
  • 17,004
  • 14
  • 56
  • 86
20
votes
2 answers

Mapping ranges of values in pandas dataframe

Apologies if this has been asked before, but I looked extensively without results. import pandas as pd import numpy as np df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a']) a 0 7 1 8 2 8 3 3 4 1 5 1 6 2 7 8 8 …
E. Sommer
  • 710
  • 1
  • 7
  • 28
20
votes
3 answers

Python: Checking to which bin a value belongs

I have a list of values and a list of bin edges. Now I need to check for all values to what bin they belong to. Is there a more pythonic way than iterating over the values and then over the bins and checking if the value belongs to the current bin,…
frixhax
  • 1,325
  • 3
  • 18
  • 30
19
votes
8 answers

numpy 1D array: mask elements that repeat more than n times

Q: given an array of integers like [1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] I need to mask elements that repeat more than N times. The goal is to retrieve the boolean mask array. I came up with a rather complicated solution: import…
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
19
votes
1 answer

Better binning in pandas

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin. Currently, I'm doing this: x = 5 y = 17 z = 33 filter_values = [x, y, z] filtered_a = df[df.filtercol <= x] a_count =…
monkut
  • 42,176
  • 24
  • 124
  • 155
17
votes
5 answers

Binning of data along one axis in numpy

I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop: import numpy as np arr = np.random.randn(100, 100) nbins = 10 binned =…
obachtos
  • 977
  • 1
  • 12
  • 30
16
votes
3 answers

Converting a pandas Interval into a string (and back again)

I'm relatively new to Python and am trying to get some data prepped to train a RandomForest. For various reasons, we want the data to be discrete, so there are a few continuous variables that need to be discretized. I found qcut in pandas, which…
Amanda
  • 422
  • 2
  • 6
  • 14
1
2 3
45 46