1

MY DATA

Fruits <- c("Orange","Orange","Pineapple","Pineapple","Orange","Orange","Blueberry")
Location <- c(10, 11, 15, 16, 10, 11, 30)

MY PROBLEM

I wish to add a new column, Entry that contains a different ID for when there is a change in Fruit from the row above.

EXAMPLE OF WHAT I WOULD LIKE

Fruits <- c("Orange","Orange","Pineapple","Pineapple","Orange","Orange","Blueberry")
Location <- c(10, 11, 15, 16, 10, 11, 30)
Entry <- c(1, 1, 2, 2, 3, 3, 4)

Note how the second entry of "Orange" receives a different ID to the first, even though it is added at the same Location. My thought is to write a loop that would iterate over Fruits for a change in text, placing a value in Entry. All values in Entry must be consecutive. This seems a simple exercise but I am stuck!

Thank you.

rawr
  • 20,481
  • 4
  • 44
  • 78
user2716568
  • 1,866
  • 3
  • 23
  • 38
  • Please post a **[reproducible code snippet](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)**. Use `dput()` and give us just a snippet of your dataframe. – smci Apr 27 '15 at 01:32
  • Distribution just by fruit, or by fruit and location? – smci Apr 27 '15 at 01:35

3 Answers3

2

This is a typical rle problem, in which you can get what you are looking for by expanding the lengths value from the result of rle:

> A <- rle(Fruits)
> rep(seq_along(A$lengths), A$lengths)
[1] 1 1 2 2 3 3 4
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0
table(df$Fruits)

is what you want to get the frequency distribution of the number of fruits within the dataset. If you want distribution by fruit and location, then tell us that.

smci
  • 32,567
  • 20
  • 113
  • 146
  • @TimBiegeleisen My dataset is 344156 obs. of 2 variables, the above is an abbreviated version for example purposes. I used fruits as my data contains confidential names. I wish to... a) Create a frequency distribution of the number of fruits within the dataset. Simply counting the occurence of different fruits (for example, the number of times orange appears) is misleading. b) My main aim is to count the interaction between fruits... what fruit appears after another fruit at each location. Hence the need for the different ID. – user2716568 Apr 27 '15 at 01:39
0

Using @thelatemail's advice and an answer from... Creating a column in r that auto-increments based on other columns, I utilised the following code:

indx <- as.character(interaction(Analysis[c(1)]))
Analysis$Entry <- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))

Where Analysis is my data.frame and Fruits is the first column.

Community
  • 1
  • 1
user2716568
  • 1,866
  • 3
  • 23
  • 38