I have a large dataset in CSV:
- There are 50,000 rows, each row is one transaction.
- There are a maximum of 5 items and a minimum of 1 item in each transaction.
- There are 5000 different possible item values.
- There are no duplicate items in a transaction.
After loading the CSV into RStudio and applying unclass()
, I apply as(...,"transactions")
.
The result is something like this:
# transactions in sparse format with
# 5 transactions (rows) and
# 1455 items (columns)
Instead of 50,000 transactions, there are only 5 now.
Where have all the transactions gone? Was the matrix somehow transposed (as the row count in the result equals the column count of my CSV)?
This may be a data pre-processing problem, but according to this post my input data should have the right format.
[I'm posting for the first time here and am fairly new to R/RStudio.]