I have a vector of IDs currently as factors. I have a for loop that checks for those IDs in a data frame and returns a particular value. I am creating a data frame that stores in column 1 the ID currently being run through the loop and in the second column the value of interest.
The problem I am having is when assigning the ith ID to my data frame, it is returning the factor's index number and not the value. See code.
ref <- unique(yearsd[,11]) # yearsd df has customer records; i'm extracting unique IDs
counter <- data.frame(matrix(ncol = 2, nrow = length(ref))) # initialize counter for for loop
for(i in 1:length(ref))
{
loc <- which(ref[i] == yearsd[,11]) # returns positions of IDs
yearTF <- unique(yearsd[loc,3]) # gives me a vector of years that ID shows up
counter[i,1] = print(ref[i]) # store the ID currently in the loop
counter[i,2] = length(yearTF) # store the number of years the show up in the records
}
If the ith element of ref is ABCD and is the 32nd level of the factor, my counter[i,1] value ends up being 32 instead of ABCD. I also tried print(ref[i])
but had no luck with that either. I always get the level's index number of the factor.
Would it be better if I just change it to character? They are alphanumeric strings.
Edit
- yearsd is a df with customer records
- yearsd[,11] contains the customer ID
- for each record, there is a transaction date, which stores only the year, e.g. 2005, 2006, etc.
I am trying to go through yearsd to get a df containing those customer IDs in one column and a count of how many years they had transactions in the second column/
Example Output:
CustID YearsIn
A0001 3
D504 1
RR45Y 2
Meaning customer A0001 had transactions in 3 different years, D504 had transactions in only 1 year, and RR45Y had transactions in 2 different years. Each customer may have multiple transactions in a year. I only care to know if they had at least 1; if so, I count that year for that customer.
Let me know if you have any questions. I appreciate the help.