3

The following is a small extract of a large excel table with many weeks of data:

                                                 Q
+--------------+--------------------------+------------+
|   Keyword    |        searches          |    KW45    |
+--------------+--------------------------+------------+
| durni coat   |                      110 |          1 |
| gliss coat   |                       40 |          2 |
| hart coat    |                       70 |          7 |
| kepla coat   |                       50 |          3 |
| lasox coat   |                        0 |          1 |
| magna coat   |                       10 |            |
| magoxid coat |                       10 |          2 |
| plasma coat  |                       50 |          3 |
| selga coat   |                        0 |          1 |
| sila coat    |                        0 |         25 |
| temp coat    |                       10 |         50 |
+--------------+--------------------------+------------+


Currecntly I use these functions in Excel in columns C :

    =COUNTIFS(Q:Q;"=1";;INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;"=2";;INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;"=3";;INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;">=4";Q:Q;"<=5";INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;">=6";Q:Q;"<=10";INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;">=11";Q:Q;"<=20";INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
    =COUNTIFS(Q:Q;">=21";Q:Q;"<=30";INDEX($Q:$T;;MATCH(C1;$Q$1:$T$1;0));">0")
and so on....

which match the week and postion range to create the following table:


    |    B       |   C  | 
    | Position   | KW45 | 
    |------------|------|
    | Pos. 1     | 3    | 
    | Pos. 2     | 2    | 
    | Pos. 3     | 2    | 
    | Pos. 4-5   | 0    |
    | Pos. 6-10  | 1    |
    | Pos. 11-20 | 0    |
    | Pos. 21-30 | 1    |
    | Pos. 31-40 | 0    |
    | Pos. 41-50 | 1    |


How would I go about to acheive the above functuon in pandas?

My goal ist to count the number of occurances of:

value 1 in column Q
value 2 in column Q
value 3 in column Q
value range 4-5 in column Q
value range 6-10 in column Q
value range 11-20 in column Q

and so on... 

ufulu
  • 83
  • 4
  • Can you add, which column is `Q` etc. A bit more information on what is counted etc would be nice – LeoE Nov 28 '19 at 21:15
  • tbh, I don't think Excel is that common of a language in Stackoverflow. I can't even tell what that line does. – Quang Hoang Nov 28 '19 at 21:16
  • yes I am terriby sorry, I already noticed that I had forgotten that bit of information and added it. – ufulu Nov 28 '19 at 21:20
  • Can you break down your countifs, for evaluation of what it does, i'm not that familiar with excel formulas – oppressionslayer Nov 28 '19 at 21:20
  • Where do you get your ranges from? Or is it from the column B? Or is it always the same as in your result table? – LeoE Nov 28 '19 at 21:20
  • I defind the ranges and put them in column B. I want to count how many occurances of value 1 are in column Q, how many orrcuances of value 2 in column Q and then change to ranges for 4-5, 6-10, 11-20 and so on. – ufulu Nov 28 '19 at 21:25

2 Answers2

2

My goal ist to count the number of occurances:

(pd.cut(df['KW45'], bins=[0, 1, 2, 3, 5, 10, 20, 30, 40, 50])
   .value_counts(sort=False)
)

Output:

(0, 1]      4
(1, 2]      2
(2, 3]      2
(3, 5]      0
(5, 10]     1
(10, 20]    0
(20, 30]    0
(30, 40]    0
(40, 50]    1
Name: KW45, dtype: int64
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • well that is a very good solution. Thank you. I read about and played with bins but some how didn't get it to work. – ufulu Nov 28 '19 at 21:57
0

I think I'm understanding this -- why not use np.where() and sum() so from your example -- import excel as dataframe

df = pd.read_excel('path_here')

then create columns of counts (or ranges) ---

#Syntax for np.where 
#column = np.where(condition, if_true, if_false)

#Creates a cumulative count
df['val1_sum'] = np.where(df['Q'] == value1,1,0).cumsum()

or to just take the sum

sum_val1 = np.where(df['Q'[ == value1, 1, 0).sum()

#Range based 
range_1 = np.where((df['Q'] > 5) & (df['Q'] <= 10),1,0).sum()

I like simple, and modular -- and this is easy to follow.

If you're counting lots of different values, I would use count() w/ multii-index, or nunique() -- this doesn't solve range problem, but you could further filter with np.where() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html

Count unique values with pandas per groups

df.groupby('Q')['col_to_return'].nunique()
Zach Oakes
  • 185
  • 14