Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

missing completely at random;
missing at random;
missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions

1068

votes

20 answers

Remove rows with all or some NAs (missing values) in data.frame

I'd like to remove the lines in this data frame that: a) contain NAs across all columns. Below is my example data frame. gene hsap mmul mmus rnor cfam 1 ENSG00000208234 0 NA NA NA NA 2 ENSG00000199674 0 2 2 2 …

asked Feb 01 '11 at 11:52

Benoit B.

11,854
8
26
29

149

votes

10 answers

How to lowercase a pandas dataframe string column if it has missing values?

The following code does not work. import pandas as pd import numpy as np df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) xLower = df["x"].map(lambda x: x.lower()) How should I tweak it to get xLower = ['one','two',np.nan] ? Efficiency is…

python string pandas missing-data

asked Mar 07 '14 at 08:34

P.Escondido

3,373
6
23
29

votes

2 answers

str.format() raises KeyError

The following code raises a KeyError exception: addr_list_formatted = [] addr_list_idx = 0 for addr in addr_list: # addr_list is a list addr_list_idx = addr_list_idx + 1 addr_list_formatted.append(""" "{0}" { …

python syntax string-formatting delimiter missing-data

asked May 02 '10 at 22:06

Dor

7,344
4
32
45

votes

14 answers

Elegant way to report missing values in a data.frame

Here's a little piece of code I wrote to report variables with missing values from a data frame. I'm trying to think of a more elegant way to do this, one that perhaps returns a data.frame, but I'm stuck: for (Var in names(airquality)) { …

r dataframe missing-data

asked Nov 29 '11 at 20:23

Zach

29,791
35
142
201

votes

7 answers

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced

I am trying to convert a csv into numpy array. In the numpy array, I am replacing few elements with NaN. Then, I wanted to find the indices of the NaN elements in the numpy array. The code is : import pandas as pd import matplotlib.pyplot as…

python numpy nan missing-data numpy-ufunc

asked Oct 05 '18 at 01:45

Thedeadman619

votes

5 answers

Delete rows with blank values in one particular column

I am working on a large dataset, with some rows with NAs and others with blanks: df <- data.frame(ID = c(1:7), home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"), …

r dataframe missing-data

asked Feb 03 '12 at 10:06

KT_1

8,194
15
56
68

votes

9 answers

Format string unused named arguments

Let's say I have: action = '{bond}, {james} {bond}'.format(bond='bond', james='james') this wil output: 'bond, james bond' Next we have: action = '{bond}, {james} {bond}'.format(bond='bond') this will output: KeyError: 'james' Is there some…

python string string-formatting missing-data defaultdict

asked Jun 20 '13 at 13:50

nelsonvarela

2,310
7
27
43

votes

6 answers

Python, Pandas : Return only those rows which have missing values

While working in Pandas in Python... I'm working with a dataset that contains some missing values, and I'd like to return a dataframe which contains only those rows which have missing data. Is there a nice way to do this? (My current method to do…

python pandas missing-data

asked May 25 '15 at 23:03

user2487726

votes

14 answers

Replace missing values with column mean

I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well. Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE)) The code for…

r missing-data imputation

asked Sep 14 '14 at 16:50

Nikita

votes

1 answer

Include levels of zero count in result of table()

I have a vector 'y' and I count the different values using table: y <- c(0, 0, 1, 3, 4, 4) table(y) # y # 0 1 3 4 # 2 1 1 2 However, I also want the result to include the fact that there are zero 2's and zero 5's. Can I use table() for…

r count missing-data

asked Oct 24 '09 at 05:31

Christopher DuBois

42,350
23
71
93

votes

3 answers

What is the difference between and NA?

I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this: head(SMOKE) # N N Y Y N # Levels: Y N Why is R displaying NA…

r na missing-data

asked Apr 27 '13 at 15:24

oort

1,840
2
20
29

votes

9 answers

Insert rows for missing dates/times

I am new to R but have turned to it to solve a problem with a large data set I am trying to process. Currently I have a 4 columns of data (Y values) set against minute-interval timestamps (month/day/year hour:min) (X values) as below: timestamp …

r time-series missing-data

asked May 28 '13 at 08:12

James A

votes

5 answers

Replace NA with previous or next value, by group, using dplyr

I have a data frame which is arranged by descending order of date. ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), age =…

r dplyr missing-data zoo

asked Oct 14 '16 at 10:22

Tarak

1,035
2
8
14

votes

10 answers

How do I get a summary count of missing/NaN data by column in 'pandas'?

In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe does not report these values. I gather I can do something like len(mydata.index) - mydata.count() to compute the number…

pandas reporting nan missing-data

asked Mar 07 '14 at 18:08

orome

45,163
57
202
418

votes

3 answers

Convert NA into a factor level

I have a vector with NA values that I would like to replace by a new factor level NA. a = as.factor(as.character(c(1, 1, 2, 2, 3, NA))) a [1] 1 1 2 2 3 Levels: 1 2 3 This works, but it seems like a strange way to do it. a =…

r missing-data

asked Nov 28 '14 at 21:12

marbel

7,560
6
49
68

2 3

…

99 100 Next