- I have a big data frame (~ 280000 rows x 1200 columns), each row represents a basket of items.
- The first column has the basket id.
- The next ~120 columns have either a 4 digit item code (of one of the items present in the basket) or are blank (for remainder of 120 cells after all items of the basket have been accounted for).
- The subsequent columns (from 121 till 1200) are each named with one of the unique 4 digit item codes from the items universe. All these columns are blank.
Now, I want to tag the cells in these columns (121 to 1200), if that item (the column name) appears in that row/basket.
Following is a smaller version of the dataframe (df);
df <- data.frame(BasketID = c("001", "002"),
Item1 = c(1001, 1002), Item2 = c(1002,""), Item3 = "",
`1001` = "", `1002` = "", `1003` = "", check.names=F)
BasketID Item1 Item2 Item3 ... 1001 1002 1003
001 1001 1002
002 1003
Below is what I require;
BasketID Item1 Item2 Item3 ... 1001 1002 1003
001 1001 1002 tag tag
002 1003 tag
I wrote the following for loop to achieve the above;
for (i in rownames(df)) {
for (j in colnames(df[,121:1200])) {
if (j %in% df[i,121:1200]) {
df[i,j] <- "tag"
}
}
}
However, since the dataframe is big, the above command is taking forever to run forcing me to abort midway. Is there a more efficient way to do this? Thanks v. much in advance!!