0

I have a vector of IDs currently as factors. I have a for loop that checks for those IDs in a data frame and returns a particular value. I am creating a data frame that stores in column 1 the ID currently being run through the loop and in the second column the value of interest.

The problem I am having is when assigning the ith ID to my data frame, it is returning the factor's index number and not the value. See code.

ref <- unique(yearsd[,11]) # yearsd df has customer records; i'm extracting unique IDs
counter <- data.frame(matrix(ncol = 2, nrow = length(ref))) # initialize counter for for loop

for(i in 1:length(ref))
{
  loc <- which(ref[i] == yearsd[,11]) # returns positions of IDs
  yearTF <- unique(yearsd[loc,3])     # gives me a vector of years that ID shows up
  counter[i,1] = print(ref[i])        # store the ID currently in the loop
  counter[i,2] = length(yearTF)       # store the number of years the show up in the records
}

If the ith element of ref is ABCD and is the 32nd level of the factor, my counter[i,1] value ends up being 32 instead of ABCD. I also tried print(ref[i]) but had no luck with that either. I always get the level's index number of the factor.

Would it be better if I just change it to character? They are alphanumeric strings.

Edit

  • yearsd is a df with customer records
  • yearsd[,11] contains the customer ID
  • for each record, there is a transaction date, which stores only the year, e.g. 2005, 2006, etc.

I am trying to go through yearsd to get a df containing those customer IDs in one column and a count of how many years they had transactions in the second column/

Example Output:

CustID   YearsIn
A0001    3
D504     1
RR45Y    2

Meaning customer A0001 had transactions in 3 different years, D504 had transactions in only 1 year, and RR45Y had transactions in 2 different years. Each customer may have multiple transactions in a year. I only care to know if they had at least 1; if so, I count that year for that customer.

Let me know if you have any questions. I appreciate the help.

Jason
  • 19
  • 3
  • 8
  • I think your counter is a numeric matrix. That's why its not able to save a character in that. try saving in a list and then convert into a data frame. – Koundy Dec 11 '14 at 06:36
  • A complete [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful here. It's unclear the exact data types of all objects involved. I'm not sure where you are "returning a value" from the assignment. – MrFlick Dec 11 '14 at 06:57
  • So you're just trying to count the number of times that each year appears? You probably should just be using `table(yearsd[,11])` for that. But since `yearsd` still isn't defined here so i'm unclear as to it's exact structure so it's still not reproducible. But I think you should be using `table()` or `aggregate()` or some other more R-like function here. Give sample input and desired output. – MrFlick Dec 11 '14 at 06:59
  • @koundy I did a unit test with counter initialized as suggested in http://stackoverflow.com/questions/12613909/how-to-create-empty-data-frame-with-column-names-specified-in-r where counter[,1] is character. Its still storing the ith level index value. – Jason Dec 11 '14 at 07:03
  • @MrFlick I'm trying to count how many years an ID shows up. An ID may show up 3 times in 2011 and 4 times in 2012. Thus, the count would be 2. Will add another edit to be clear. – Jason Dec 11 '14 at 07:09

1 Answers1

0

How about using aggregate instead (since this is the problem it seems you are really trying to solve.)

#sample data
dd<-data.frame(
    cust=rep(c("A001", "D504","RR457"), c(3,1,2)),
    year = c(2001:2003, 2002, 2003:2004)
)

aggregate(year~cust, dd, function(x) length(unique(x)))

#    cust year
# 1  A001    3
# 2  D504    1
# 3 RR457    2

But going back to your problem a bit more, you can't really initialize a data.frame that way. When you set it up with no rows, it picks the simplest datatype it can (an empty logical vector). If you wanted to pre-fill the data.frame, a better strategy would have been

ref <- unique(dd$cust)
counter <- data.frame(id=factor(NA,levels=ref), 
    count=numeric(length(ref)), stringsAsFactors=F) 

for(i in 1:length(ref)) {
  loc <- which(ref[i] == dd$cust)
  yearTF <- unique(dd[loc,"year"])
  counter[i,1] <- ref[i]
  counter[i,2] <- length(yearTF)
}
counter

or even just doing

counter[i,1] <- as.character(ref[i])

would have forced the conversion to character (which print() does not do).

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks! Exactly the result I was trying to get and appreciate the tips on initialization. – Jason Dec 11 '14 at 07:29