Convert a list of lists back to dataframe column to be used as condition for row deletion

Question

I am working on a dataframe consists of a collection of social media post. After parsing, stemming, and cleaning the text column from that dataframe, I want to convert the output (mylist, which is a list of lists) back to the original metadata (mydf) to remove rows (from mydf) where the parsed/cleaned text columns have zero character length (i.e., character(0)).

I have referenced some previous posts (1, 2), but given that my data contain several foreign language posts (e.g., row 6) whose text are segmented differently and are returned as a list of concatenated string objects, hence, the approaches recommend by 1 didn't work because R had a hard time determining where that Chinese sentence ends.

Part of my data are provided at below. It will be highly appreciated if someone could shed light on this.

# part of the data
mydf <- data.frame(document = c("I want an apple", "//:", "This is a dog", "Suppose that...", "@%!!", "半夜快笑死"),
id = c(1, 2, 3, 4, 5, 6), gender = c("M", "F", "M", "M", "F", "?"), source = c("Facebook", "Facebook", "Twitter", "Facebook", "Twitter", "Weibo"))

# the parsed/stemmed text output
mylist <- list()
mylist[1] = "i want an apple"
mylist[2] = list(character(0))
mylist[3] = "this is a dog"
mylist[4] = "suppose that"
mylist[5] = list(character(0))
mylist[6] = list(c("半夜", "快", "笑死"))

mylist

# I want to delete rows from mydf where their correspondng text has zero character length on mylist

score 1 · Accepted Answer · answered Aug 01 '19 at 18:43

1

Is this close to what you need?

  mydf[as.logical(lengths(mylist)), ]

answered Aug 01 '19 at 18:43

Pablo Rod

669
4
10

Wow, this is very neat. Thanks a lot! – Chris T. Aug 01 '19 at 19:02

Convert a list of lists back to dataframe column to be used as condition for row deletion

1 Answers1