I have a data frame with 30k observations but i think one of my columns has only NA values. How to check if that specific column has only NA values or not because having so much observation i can check them without code.
Asked
Active
Viewed 2,291 times
3 Answers
1
We can try comparing the NA count against the total row count, for each column, and return a value of 1 should those counts be equal. Then, subset the names of the data frame, retaining only those columns having all NA
values.
names(df)[sapply(df, function(x) sum(is.na(x)) == length(x))]
[1] "v3"
Data:
df <- data.frame(v1=c(1,2,3), v2=c(4,NA,6), v3=c(NA,NA,NA))

Tim Biegeleisen
- 502,043
- 27
- 286
- 360
0
In Python if your column is df['column_name']
Then you can use this if statement:
if df['column_name'].isnull().sum() == df['column_name'].shape[0]:
print('All nulls')

kotbegemot
- 71
- 5
-
The question is tagged with R, not python, how is this relevant? – zx8754 Jan 16 '20 at 10:57
0
Try this one
data <- tibble(
a = rnorm(1000, 5, 1),
b = NA,
c = c(NA, rnorm(999, 10, 5))
)
data %>%
summarise_all(~all(is.na(.)))
# A tibble: 1 x 3
a b c
<lgl> <lgl> <lgl>
1 FALSE TRUE FALSE

jyjek
- 2,627
- 11
- 23