1

First time posting here - have decided to try and learn how to use python whilst on Covid-19 forced holidays.

I'm trying to summarise some data from a pretty simple database and have been using the value_counts function.

Rather than running it on every column individually, I'd like to loop it over each one and return a summary table. I can do this using df.apply(pd.value_counts) but can't work out how to enter parameters into the the value counts as I want to have dropna = False.

Basic example of data I have:

# Import libraries 
import pandas as pd 
import numpy as np

# create list of winners and runnerup
data = [['john', 'barry'], ['john','barry'], [np.nan,'barry'], ['barry','john'],['john',np.nan],['linda','frank']] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['winner', 'runnerup']) 

# print dataframe. 
df

How I was doing the value counts for each column:

#Who won the most?
df['winner'].value_counts(dropna=False)

Output:
john     3
linda    1
barry    1
NaN      1
Name: winner, dtype: int64

How can I enter the dropna=False when using apply function? I like the table it outputs below but want the NaN to appear in the list.

#value counts table
df.apply(pd.value_counts)
      winner    runnerup
barry   1.0       3.0
frank   NaN       1.0
john    3.0       1.0
linda   1.0       NaN

#value that is missing from list
#NaN    1.0       1.0

Any help would be appreciated!!

MichaelH
  • 43
  • 2
  • Does this answer your question? [python pandas: apply a function with arguments to a series](https://stackoverflow.com/questions/12182744/python-pandas-apply-a-function-with-arguments-to-a-series) – wwii Apr 03 '20 at 23:04

2 Answers2

0

You can use df.apply, like this:

df.apply(pd.value_counts, dropna=False)
Joe Mayo
  • 7,501
  • 7
  • 41
  • 60
  • thanks! that worked for me. I was trying to put the 'dropna' inside the value counts. e.g. df.apply(pd.value_counts(dropna=False)). – MichaelH Apr 07 '20 at 10:40
0

In pandas apply function, if there is a single parameter, you simply do:

.apply(func_name)

The parameter is the value of the cell. This works exactly the same way for pandas build in function as well as user defined functions (UDF).

for UDF, when there are more than one parameters:

.apply(func_name, args=(arg1, arg2, arg3, ...))

See: this link

zafrin
  • 434
  • 4
  • 11
  • Thanks for the explanation, appreciate your quick response. I didn't quite understand the documentation the first time I reviewed it, I was thinking I had to put the 'dropna=False' as an 'arg' but now I understand (I think) that the dropna is a parameter and the args are related to positions. Will have to do some more reading! – MichaelH Apr 07 '20 at 10:15
  • Glad it helped. If you have not done much object oriented programming (OOP). I would encourage you to look at at least some basics. This will make you more comfortable with passing self as an argument. – zafrin Apr 07 '20 at 14:54