7

I have a CSV file

Identity,Number,Data,Result,Add,,,,,,,,,,,,
1,,,,4,55,,92,,,,,,,,,62,
3,,,,7,43,,12,,,,,,,,,74,
7,,,,3,58,,52,,,,,,,,,64,
0,,,,6,10,,22,,,,,,,,,96,
3,,,,8,13,,92,,,,,,,,,22,

How to remove empty columns in R?

Desired Output

Identity,Number,Data,Result,Add
1,4,55,92,62
3,7,43,12,74
7,3,58,52,64
0,6,10,22,96
3,8,13,92,22
user3188390
  • 603
  • 2
  • 11
  • 19

8 Answers8

15

After you've imported your data (using the method the other answerer suggested) run this command, substituting mydf for whatever you decide to call your data frame:

#Remove empty columns
mydf <- Filter(function(x)!all(is.na(x)), mydf)
americo
  • 1,013
  • 8
  • 17
  • For some reasons, this does not work with the example from this post https://www.codingprof.com/3-easy-ways-to-remove-empty-columns-in-r – Tung Jun 30 '22 at 14:35
3

Convert the empty cell to NA and then remove the NAs

mydata[mydata == ""] <- NA
mydata<- mydata[,colSums(is.na(mydata)) < <a desired filter value>]
Pramit
  • 1,373
  • 1
  • 18
  • 27
2

A more complex removal logic is like this:

mydf <- Filter(function(x)!all(is.na(x) || is.null(x) || x == "" || x == 0), mydf)
R Keene
  • 71
  • 2
2

Try remove_empty from janitor

Using the input from Thomas's post:

> mydf

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
1  1 NA NA NA  4 55 NA 92 NA  NA  NA  NA  NA  NA  NA  NA  62  NA
2  3 NA NA NA  7 43 NA 12 NA  NA  NA  NA  NA  NA  NA  NA  74  NA
3  7 NA NA NA  3 58 NA 52 NA  NA  NA  NA  NA  NA  NA  NA  64  NA
4  0 NA NA NA  6 10 NA 22 NA  NA  NA  NA  NA  NA  NA  NA  96  NA
5  3 NA NA NA  8 13 NA 92 NA  NA  NA  NA  NA  NA  NA  NA  22  NA

> mydf %>% remove_empty("cols")

1  1  4 55 92  62
2  3  7 43 12  74
3  7  3 58 52  64
4  0  6 10 22  96
5  3  8 13 92  22
Dannid
  • 1,507
  • 1
  • 20
  • 17
1

If we have limited columns like you have in this case, below would be neat workaround.

myDF<-myDF[,c(1,2,3,4,5)]
Chirag
  • 1,478
  • 16
  • 20
0

Load the data file, excluding the header row:

mydf <- 
read.csv(text='Identity,Number,Data,Result,Add
1,,,,4,55,,92,,,,,,,,,62,
3,,,,7,43,,12,,,,,,,,,74,
7,,,,3,58,,52,,,,,,,,,64,
0,,,,6,10,,22,,,,,,,,,96,
3,,,,8,13,,92,,,,,,,,,22,', header=FALSE, skip=1)

Then load just the header row:

n <- read.csv(text='Identity,Number,Data,Result,Add', header=TRUE)

Then use the names from n to assign names to the full data, excluding NA columns:

setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))

Note, if you have this saved as a file, the full code should be:

mydf <- read.csv('file.csv', header=FALSE, skip=1)
n <- read.csv('file.csv', header=TRUE, nrows=1)
setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • I have edited my question, now there are equal number of commas in the header as well, so basically it's being read as 16 cols instead of 5, how to solve the problem, and thanks for the help. – user3188390 Feb 28 '14 at 20:38
0

How about you keep only those columns which name is not blank?

> mydf <- mydf[,names(mydf)!=""]
Clarius
  • 1,183
  • 10
  • 10
0

Here's a fancy function I sometimes use for fun. Note that it is not robust - it searches the complete environment for a .data data set, which dplyr puts there if running select() so it depends on their internal framework (which might change):

empty_cols <- function(x) {
  for (frm in sys.frames()) {
    if (is.data.frame(frm$.data)) {
      df <- frm$.data
      return(which(sapply(df, function(col) all(is.na(col)))))
    }
  }
  # found nothing, return NULL to return no columns:
  NULL
}

Now select using:

library(dplyr)
my_df %>% select(!empty_cols())

Would've been great if we could just use dplyr::pick(everything()) in this function, but they have limited the use to mutate(), filter() and group_by()...

MS Berends
  • 4,489
  • 1
  • 40
  • 53