0

I have a nested for loop in the below code.

This loops over every column and every row - is there a simple way to vectorise this?

FYI - the content of the loop verifies if the list in each entry contains only NA and thus the entire column can be removed.


# install.packages("rtweet")
library("rtweet")             
rbloggers <- get_timeline(user = "Rbloggers", n = 10000)
View(rbloggers)
# install.packages("janitor")
library("janitor")             

rbloggers <- janitor::remove_empty(rbloggers, which = "cols")
# this removes the columns with NA or blank - which are not in lists.

# readr::write_csv - would like to use this later and this cannot handle vector of type list.

rbloggers <- as.data.frame(rbloggers)

for (j in 1:ncol(rbloggers)){

    x <- 0
    for (i in 1:nrow(rbloggers)){
      x <- x + all(is.na(rbloggers[i,j][[1]]))
    }

    # if every element is NA, then remove the column
    if(x == nrow(rbloggers)) {rbloggers[,j] <- NULL}

                            # Many ways to remove a column:
                            # # Data[2] <- NULL
                            # # Data[[2]] <- NULL
                            # # Data <- Data[,-2]
                            # # Data <- Data[-2]
}


FYI - I am trying to understand the following references:

Kunal
  • 23
  • 1
  • 4
  • 2
    Choice of title is probably suboptimal. Can you perhaps improve on it so that it better describes the problem you're trying to solve? – Roman Luštrik Apr 29 '19 at 09:22
  • 1
    Thanks Roman - I tried to use the words "vectorise" and "loop" - this was blocked. So have tried my best to improve the title: "How to not use a nested for loop and improve my R code?" – Kunal Apr 29 '19 at 09:25
  • What are you trying to do? If I read through the links I might find out, but it would be much easier (and future proof) if you could explain it in your question. – AkselA Apr 29 '19 at 09:37
  • Do just want to remove columns that are all `NA`? – AkselA Apr 29 '19 at 09:39
  • Possible duplicate of [Remove columns from dataframe where ALL values are NA](https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na) – AkselA Apr 29 '19 at 09:48
  • I think the main issue here is that we are dealing with _list columns_ - they want to check whether all elements of all lists in a column are NA. – neilfws Apr 29 '19 at 22:56

1 Answers1

0
library(rtweet)             
rbloggers <- get_timeline(user = "Rbloggers", n = 10000)

library(janitor)             

rbloggers <- janitor::remove_empty(rbloggers, which = "cols")

# find the sum of NA in each col
colSums(is.na(rbloggers))
#>                user_id              status_id             created_at 
#>                      0                      0                      0 
#>            screen_name                   text                 source 
#>                      0                      0                      0 
#>     display_text_width               is_quote             is_retweet 
#>                      0                      0                      0 
#>         favorite_count          retweet_count               hashtags 
#>                      0                      0                      0 
#>               urls_url              urls_t.co      urls_expanded_url 
#>                      0                      0                      0 
#>       mentions_user_id   mentions_screen_name                   lang 
#>                   3175                   3175                      0 
#>             geo_coords          coords_coords            bbox_coords 
#>                      0                      0                      0 
#>             status_url                   name               location 
#>                      0                      0                      0 
#>            description                    url              protected 
#>                      0                      0                      0 
#>        followers_count          friends_count           listed_count 
#>                      0                      0                      0 
#>         statuses_count       favourites_count     account_created_at 
#>                      0                      0                      0 
#>               verified            profile_url   profile_expanded_url 
#>                      0                      0                      0 
#>           account_lang profile_background_url      profile_image_url 
#>                      0                      0                      0

library(dplyr)

# remove the cols that consist of NA
rbloggers_clean <- rbloggers %>% 
  select(- mentions_user_id, -mentions_screen_name)
AkselA
  • 8,153
  • 2
  • 21
  • 34