6

This question is about how Excel's COUNTIF function treats different data types when used as an array formula.

There are lots of good posts out there detailing how to use COUNTIF for tasks such as extracting unique values from a list, for example this post. I've managed to use examples from this and other posts to solve specific problems, but I'm trying to get a deeper understanding of array formulas in order to adapt my formulas to new needs.

I came across a peculiar behavior of COUNTIF. In general, Excel seems to treat strings as "larger than" numbers, so that the following examples are valid:

Cell Formula      Returns
=1<2              TRUE
="a"<"b"          TRUE
="a">"b"          FALSE
=1<"b"            TRUE

Now, suppose range A1:A6 contains the following data set:

1
2
3
A
B
C   

For each cell in this set, I want to check how many of all the cells in the set that are smaller than or equal to that cell (a useful technique in more complex formulas). I enter the following array formula in range B1:B6:

{=COUNTIF($A$1:$A$6,"<="&$A$1:$A$6)} (CTRL + SHIFT + ENTER)

Based on the examples above comparing numbers and strings (also illustrated in Column D below), I would expect the output shown below to look like Column C. However, the array formula returns the result shown in Column B, which suggests that strings and number elements are counted separately by arraywise COUNTIF.

Column A     Column B     Column C     Column D
1            1            1            A1<"C" = TRUE
2            2            2            A2<"C" = TRUE
3            3            3            A3<"C" = TRUE
A            1            4            A4<"C" = TRUE
B            2            5            A5<"C" = TRUE
C            3            6            A6<"C" = FALSE

So the question is how to produce the output in Column C? (EDIT: Just to clarify, I'm specifically looking for solutions that make use of COUNTIF's array properties.)

Any insight into why arraywise COUNTIF apparently behaves differently than the single-cell examples would also be much appreciated.

NOTE: I've translated the examples from a non-English version of Excel, so I apologize in advance for any typos.

PS. For a background, I ran into this problem when I tried to build a formula that would both extract unique values from a list with possible duplicates, and sort the unique values in numerical/alphabetical order. My current solution is to do this in two steps. One solution for how to do it in one step is proposed here.

Community
  • 1
  • 1
Egalth
  • 962
  • 9
  • 22
  • I believe you may be using `COUNTIF` incorrectly. – Tim Biegeleisen Oct 27 '15 at 01:37
  • The syntax `{=COUNTIF($A$1:$A$6,"<="&$A$1:$A$6)}` is actually extracted from a much more complex formula that behaves precisely as expected, and which as been used by others to solve similar problems. This post illustrates an intermediate step in that formula. – Egalth Oct 27 '15 at 01:44

4 Answers4

3

First of all, excellently laid-out question, and on an interesting topic to boot.

I also raised an eyebrow when I first came across this behaviour of the COUNTIF(S)/SUMIF(S) functions. In their defence, I suppose we could construct situations in which we actually want strings and numerics to be considered separately.

In order to construct your required in-formula array, you will need something like:

MMULT(0+(TRANSPOSE($A$1:$A$6)<=$A$1:$A$6),ROW($A$1:$A$6)^0)

though note that the necessary transposition will mean that any set-up which includes this construction will require committing with CSE.

Regards

XOR LX
  • 7,632
  • 1
  • 16
  • 15
2

The different behavior can easily shown if you compare

=COUNTIF($A$1:$A$6,"<=A")

with

{=COUNT(IF($A$1:$A$6<="A",1))}

The first will only get text values from $A$1:$A$6 because it is clearly text to compare and it is faster ignoring other values then. =COUNTIF($A$1:$A$6,"<=3") will only get numeric values from $A$1:$A$6 because of the same reasons. Even if the criterion would be a concatination with a cell reference, then the concatination would be the first process and would lead either to "<=A" or "<=3". So it is ever clear what to compare, text or numbers.

The second first needs an array of the comparisons, then performs the IF, gets so an array of 1 or FALSE and counts then. But the "A" could also be a cell reference. So it is not clear what to compare at the beginning and the first array has to compare all values in $A$1:$A$6.

So COUNTIF(S) and SUMIF(S) cant be used comparing mixed text and numeric data.

The solution is shown already by XOR LX.

Btw.: with your PS. For a background you should consider the following solution from an German Excel site: http://www.excelformeln.de/formeln.html?welcher=236.

In your linked example:

Formula in B2 downwards

{=INDEX($A$2:$A$99,MATCH(LARGE(COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),ROWS($1:1)),COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),0))&""}

In this solution the COUNTIF compares with >= so the biggest text or number will count lowest and so get the lowest position. All number positions are added with 99. So they are ever greater than all possible text positions. So we have a descended sorted array. Then, using LARGE, the list is created from the highest to the lowest position.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
0

I doubt that countif is the right function for what you want to achieve here.

try this (ctrl+shift+enter):

={SUM(IF(A1>=$A$1:$A$6,1,0))}

You will get

1
2
3
4
5
6

PS: CountIf is an basically an array function internally. Using it in another array function results into multiple array functions and their behaviour becomes complex. Array functions are best used with clear logical path.

As tested in Excel 2013, you will only get 1 in all results instead of what was proposed in Column B.

Currently, in the function provided by you, countif cannot figure out which cell to compare to which cell. Array functions expand ranges and then perform the provided action. Therefore, it is comparing each cell to same cell and resulting into 1.

Som Shekhar
  • 138
  • 10
  • Perhaps you can clarify your comment "I doubt that countif is the right function for what you want to achieve here."? On what basis do you doubt that? What insights led you to that conclusion, which might also be useful to the OP in helping them understand the behaviour of COUNTIF that they have witnessed? – XOR LX Oct 27 '15 at 08:39
  • Also, why is the cell reference A1 in parentheses? – XOR LX Oct 27 '15 at 08:40
  • parentheses are not required. – Som Shekhar Oct 27 '15 at 09:12
  • As for only getting a vector of only 1s in Column B, this is the result returned if the formula is entered *only* into cell B1 and then copy-pasted down. To reproduce Column B you need to mark *the full range* B1:B6 before the formula is entered. – Egalth Oct 27 '15 at 11:56
  • Yes, point taken, in this simple example COUNTIF is not strictly necessary. But it is a useful construct in more complex formulas, for example when the criteria is a dynamic range. – Egalth Oct 27 '15 at 11:59
  • Ok. got it. Your purpose was to understand the issue COUNTIF function instead of the results. I think Axel explained it perfectly. – Som Shekhar Oct 27 '15 at 13:20
0

Try this FormulaArray in B1 then copy till B6:

=SUM(($A$1:$A$6<=$A1)*1)
EEM
  • 6,601
  • 2
  • 18
  • 33
  • Thanks, that certainly works for reproducing Column C. (I should have been more specific in that I was looking for solutions with COUNTIF in particular, which can be extended to situations with dynamic ranges for the critera.) – Egalth Oct 27 '15 at 18:13