Faster way to tag a cell if its column name appears in preceding cells of the same row

Question

I have a big data frame (~ 280000 rows x 1200 columns), each row represents a basket of items.
The first column has the basket id.
The next ~120 columns have either a 4 digit item code (of one of the items present in the basket) or are blank (for remainder of 120 cells after all items of the basket have been accounted for).
The subsequent columns (from 121 till 1200) are each named with one of the unique 4 digit item codes from the items universe. All these columns are blank.

Now, I want to tag the cells in these columns (121 to 1200), if that item (the column name) appears in that row/basket.

Following is a smaller version of the dataframe (df);

df <- data.frame(BasketID = c("001", "002"),
                 Item1 = c(1001, 1002), Item2 = c(1002,""), Item3 = "",
                 `1001` = "", `1002` = "", `1003` = "", check.names=F)

BasketID   Item1   Item2   Item3   ...   1001   1002   1003
001        1001    1002
002        1003

Below is what I require;

BasketID   Item1   Item2   Item3   ...   1001   1002   1003
001        1001    1002                  tag    tag
002        1003                                        tag

I wrote the following for loop to achieve the above;

for (i in rownames(df)) {  

    for (j in colnames(df[,121:1200])) {  

        if (j %in% df[i,121:1200]) { 

            df[i,j] <- "tag"
        }
    }
}

However, since the dataframe is big, the above command is taking forever to run forcing me to abort midway. Is there a more efficient way to do this? Thanks v. much in advance!!

related: https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format — jogo, Feb 28 '18 at 07:59
Hi @jogo, couldn't find the link you shared useful for my problem. Infact, the link popped when I was looking up for a solution to this problem. There, there is a separate vector using which dataframe is being filtered. however, what I need is lookups within my dataframe without filtering. Let me know if you can think of something else as well. Thanks! — Aditya Roongta, Feb 28 '18 at 08:31
Can you please giva [a small reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? i.e. edit your question. — jogo, Feb 28 '18 at 08:35
You can take the example I shared in my post. Consider it to be a 2x6 dataframe like in the example. If I had to run the for loop (shared in my post) on this 2x6 df, I guess it would run in a jiffy. It's just that my actual dataframe is a much bigger version of this 2x6 df, that's why not possible to share here. — Aditya Roongta, Feb 28 '18 at 08:40
Please use `dput(df)` to show your dataframe (or a definition of it). What value is in your dataframe `df` for `BasketID` 001 and `Item3`? — jogo, Feb 28 '18 at 08:42
Hey @jogo, have modified my post as requested. To answer your question, currently there is no value in BasketID 001 and Item3. — Aditya Roongta, Feb 28 '18 at 09:08
In your example there is a value: it is character with length 0. — jogo, Feb 28 '18 at 09:12

chinsoon12 · Answer 1 · 2018-02-28T09:40:41.653

This might be more easily done using the data.table package. Convert into a long format using melt. Subset to those with valid value. Then pivot the data using length > 0 as the fun.aggregate:

library(data.table)
dcast.data.table(
    melt(setDT(df), id.vars="BasketID"),
    BasketID ~ value,
    function(x) length(x) > 0,
    subset=.(value!=""))

#the join the results with original dataset to get OP's desired format
tags[df, on=.(BasketID)]

#    BasketID  1001  1002  1003 Item1 Item2 Item3
# 1:      001  TRUE  TRUE FALSE  1001  1002      
# 2:      002 FALSE FALSE  TRUE  1003

data:

df <- data.frame(BasketID = c("001", "002"),
    Item1 = c(1001, 1003), Item2 = c(1002,""), Item3 = "")

`D <- melt(setDT(df), id.vars="BasketID"); dcast(D[value!=""], BasketID ~ value, function(x) if (length(x)!=0) "tag" else "")` — jogo, Feb 28 '18 at 09:33

Faster way to tag a cell if its column name appears in preceding cells of the same row

1 Answers1