removing NA and blank lines from rows in column

Question

I can not get this working with my R code.

I am trying to remove all blank spaces and NA values from the column Weapon.Description in the crimeData dataset. Then print out the head of the column (minus the blanks and NAs).

Here is my current code although it doesn't remove blanks for some reason:

crimeData[!(!is.na(crimeData$Weapon.Description) & 
crimeData$Weapon.Description==""), ]
head(crimeData$Weapon.Description)

[1] STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)
[2]                                            
[3] 
[4]                                                                                             
[5]STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)

It would help if you provided sample data. Otherwise, the first answer to this question is `Error: object 'crimeData' not found`. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Dec 04 '18 at 18:05
BTW: you might reduce the conditional with `!crimeData$Weapon.Description %in% c("", NA)`. — r2evans, Dec 04 '18 at 18:06
Thank you for your reply. I am using the following kaggle dataset: https://www.kaggle.com/cityofLA/crime-in-los-angeles — Craig P H, Dec 04 '18 at 18:08
That's great. I'm not going to download a 60MB dataset for kicks, esp since I don't have a kaggle login handy. If you want quick answers, please read the links and use an easily-copied method for providing *sample data*. I'd think `dput(head(crimeData[15:18]))` would work, or pick a handful of rows where some have values to filter out, and the others do not. — r2evans, Dec 04 '18 at 18:12
That edit is a start, but it still does not fulfill the need. Really, please read *at least* the first link. Asking questions that make it easy for somebody to answer correctly and quickly will help you a lot. I suspect my answer below will suffice, but if not you are unlikely to get anything more until you provide better sample data. — r2evans, Dec 04 '18 at 18:21
You could include a line in `read.csv()` that makes blanks read in as NA? `crimeData = read.csv(path, na.strings = c("", "NA"))` Then use `complete.cases` or `NA.omit` with your data — NColl, Dec 04 '18 at 18:37
@NColl - This works perfect and an easy solution. Thank you very much! Could you post an answer so I can accept it? — Craig P H, Dec 04 '18 at 18:42

r2evans · Answer 1 · 2018-12-04T18:49:03.417

2

Your logic is a little off:

!is.na(crimeData$Weapon.Description) gives you rows where the content is not NA
crimeData$Weapon.Description=="") gives you rows where the content is the empty string
!(...&...) is negating it, so you are asking for all rows that are "not (not-NA and empty)"

Perhaps you want "not (NA or empty)"? One of these should work:

crimeData[ !crimeData$Weapon.Description %in% c("", NA), ]
crimeData[ !is.na(crimeData$Weapon.Description) & crimeData$Weapon.Description != "", ]
crimeData[ !( is.na(crimeData$Weapon.Description) | crimeData$Weapon.Description == ""), ]
crimeData[ !is.na(crimeData$Weapon.Description) & nzchar(crimeData$Weapon.Description), ]

Not sure if it's a factor, but this won't catch "only 1+ spaces". If that's the case, you might want to first do

crimeData$Weapon.Description <- trimws(crimeData$Weapon.Description)

edited Dec 04 '18 at 18:49

answered Dec 04 '18 at 18:18

r2evans

141,215
6
77
149

Thanks again for the reply. I have tried the following: head(crimeData[ !crimeData$Weapon.Description %in% c("", NA), ]). This does indeed remove both NAs and blank spaces, although it returns the head of the whole crimeData dataset instead of just the head of Weapon.Description column. Any idea why? – Craig P H Dec 04 '18 at 18:28
Welcome to R! If you want *just* that column, then specific it! `head(crimeData[...,"Weapon.Description"])` or if you want permanent removal, `crimeData <- crimeData[!...,]` then go for your column. – r2evans Dec 04 '18 at 18:48
(Dang fingers ... I meant "specify it", ... can't blame my phone, either ...) – r2evans Dec 04 '18 at 18:55

score 1 · Accepted Answer · answered Dec 04 '18 at 18:45

1

Reading all blanks in as NA then using na.omit to remove

crimeDate <- read.csv(file=file.choose(), na.strings = c("", "NA"))
na.omit(crimeDate$Weapon.Description)

answered Dec 04 '18 at 18:45

NColl

757
5
19

score 0 · Answer 3 · answered Dec 04 '18 at 18:10

0

I'm assuming you're trying to remove the entire row of NAs. If that's the case, then try this:

crimeData[which(is.na(crimeData$Weapon.Description) == FALSE),]

The which is a logical; setting the is.na() to FALSE means that all NAs will be removed.

answered Dec 04 '18 at 18:10

Yehuda

1,787
2
15
49

Really, `which(...) == FALSE` is just needless. `crimeData[!is.na(crimeData$Weapon.Description),]`. But the OP also mentioned `""`. – r2evans Dec 04 '18 at 18:11
that has removed all NAs, but I also need to remove blank spaces as some rows in the weapons column were left blank e.g. [1] STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE) [3] [5] STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE) – Craig P H Dec 04 '18 at 18:14
1

@r2evans You're right, I guess I've just gotten used to the `which` function. Using your `& crimeData$Weapon.Description !- ""` code that you pasted above will work here to remove the extra white space; the `|` operator doesn't seem to. – Yehuda Dec 04 '18 at 18:33
I fixed the `|` issue in the last line, thanks. I often see `== TRUE` and `== FALSE`, and though it's explicit and completely unambiguous, it also seems unnecessarily verbose. It works, certainly, perhaps my suggestion is more about *style* than *proper code*. – r2evans Dec 04 '18 at 18:50

removing NA and blank lines from rows in column

3 Answers3