How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?

- 56,955
- 33
- 144
- 158

- 81
- 1
- 2
-
What about `impyute` library? Little’s MCAR Test (WIP) is in its feature list. – Istrel Sep 28 '19 at 10:35
-
@Istrel impyute library does not explain how to do it (as far as I have seen), can you elaborate steps or give link for proper documentation. – Kiran Oct 13 '19 at 09:33
-
The impyute library has a ticket to implement Little's MCAR Test, but it's not in progress: https://github.com/eltonlaw/impyute/issues/71 – skeller88 Feb 26 '20 at 03:16
4 Answers
You can use rpy2 to get the mcar test from R. Note that using rpy2 requires some R coding.
Set up rpy2 in Google Colab
# rpy2 libraries
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import globalenv
# Import R's base package
base = importr("base")
# Import R's utility packages
utils = importr("utils")
# Select mirror
utils.chooseCRANmirror(ind=1)
# For automatic translation of Pandas objects to R
pandas2ri.activate()
# Enable R magic
%load_ext rpy2.ipython
# Make your Pandas dataframe accessible to R
globalenv["r_df"] = df
You can now get R functionality within your Python environment by using R magics. Use %R
for a single line of R code and %%R
when the whole cell should be interpreted as R code.
To install an R package use:
utils.install_packages("package_name")
You may also need to load it before it can be used:
%R library(package_name)
For the Little's MCAR test, we should install the naniar
package. Its installation is slightly more complicated as we also need to install remotes
to download it from github, but for other packages the general procedure should be enough.
utils.install_packages("remotes")
%R remotes::install_github("njtierney/naniar")
Load naniar
package:
%R library(naniar)
Pass your r_df
to the mcar_test
function:
# mcar_test on whole df
%R mcar_test(r_df)
If an error occurs, try including only the columns with missing data:
%%R
# mcar_test on columns with missing data
r_dfMissing <- r_df[c("col1", "col2", "col3")]
mcar_test(r_dfMissing)

- 41
- 4
-
Nice. Can you put a few words on why you would include only variables with missing data? I thought the idea was to assess differences in variables grouped by missing/non-missing, which I cannot imagine will work if we drop cols without missing. – Johan Jun 18 '23 at 13:15
-
That's a good question. The only reason I suggested including variables with missing data is because the mcar_test() function raises an error. I am not sure if this happens in every situation or just with the data I tried it with. – Akis Hadjimpalasis Aug 23 '23 at 07:18
you can simply use this function to do a Little's MCAR test, instead of using R code:
import numpy as np
import pandas as pd
from scipy.stats import chi2
def little_mcar_test(data, alpha=0.05):
"""
Performs Little's MCAR (Missing Completely At Random) test on a dataset with missing values.
Parameters:
data (DataFrame): A pandas DataFrame with n observations and p variables, where some values are missing.
alpha (float): The significance level for the hypothesis test (default is 0.05).
Returns:
A tuple containing:
- A matrix of missing values that represents the pattern of missingness in the dataset.
- A p-value representing the significance of the MCAR test.
"""
# Calculate the proportion of missing values in each variable
p_m = data.isnull().mean()
# Calculate the proportion of complete cases for each variable
p_c = data.dropna().shape[0] / data.shape[0]
# Calculate the correlation matrix for all pairs of variables that have complete cases
R_c = data.dropna().corr()
# Calculate the correlation matrix for all pairs of variables using all observations
R_all = data.corr()
# Calculate the difference between the two correlation matrices
R_diff = R_all - R_c
# Calculate the variance of the R_diff matrix
V_Rdiff = np.var(R_diff, ddof=1)
# Calculate the expected value of V_Rdiff under the null hypothesis that the missing data is MCAR
E_Rdiff = (1 - p_c) / (1 - p_m).sum()
# Calculate the test statistic
T = np.trace(R_diff) / np.sqrt(V_Rdiff * E_Rdiff)
# Calculate the degrees of freedom
df = data.shape[1] * (data.shape[1] - 1) / 2
# Calculate the p-value using a chi-squared distribution with df degrees of freedom and the test statistic T
p_value = 1 - chi2.cdf(T ** 2, df)
# Create a matrix of missing values that represents the pattern of missingness in the dataset
missingness_matrix = data.isnull().astype(int)
# Return the missingness matrix and the p-value
return missingness_matrix, p_value

- 125
- 11
-
Cool. What df do you expect as input? And I thought Little's test should return one test with one p-value, not one per column. – Johan Jun 18 '23 at 13:25
Comments suggest using existing packages. Here is an example directly taken from pyampute
:
import pandas as pd
from pyampute.exploration.mcar_statistical_tests import MCARTest
data_mcar = pd.read_table("data/missingdata_mcar.csv")
mt = MCARTest(method="little")
print(mt.little_mcar_test(data_mcar))
0.17365464213775494

- 186
- 15
import numpy as np
import pandas as pd
from scipy.stats import chi2
def little_mcar_test(data, alpha=0.05):
"""
Performs Little's MCAR (Missing Completely At Random) test on a dataset with missing values.
"""
data = pd.DataFrame(data)
data.columns = ['x' + str(i) for i in range(data.shape[1])]
data['missing'] = np.sum(data.isnull(), axis=1)
n = data.shape[0]
k = data.shape[1] - 1
df = k * (k - 1) / 2
chi2_crit = chi2.ppf(1 - alpha, df)
chi2_val = ((n - 1 - (k - 1) / 2) ** 2) / (k - 1) / ((n - k) * np.mean(data['missing']))
p_val = 1 - chi2.cdf(chi2_val, df)
if chi2_val > chi2_crit:
print(
'Reject null hypothesis: Data is not MCAR (p-value={:.4f}, chi-square={:.4f})'.format(p_val, chi2_val)
)
else:
print(
'Do not reject null hypothesis: Data is MCAR (p-value={:.4f}, chi-square={:.4f})'.format(p_val, chi2_val)
)

- 1
- 1
-
1As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 01 '23 at 06:10