Add new column based on changes in text in another column

Question

MY DATA

Fruits <- c("Orange","Orange","Pineapple","Pineapple","Orange","Orange","Blueberry")
Location <- c(10, 11, 15, 16, 10, 11, 30)

MY PROBLEM

I wish to add a new column, Entry that contains a different ID for when there is a change in Fruit from the row above.

EXAMPLE OF WHAT I WOULD LIKE

Fruits <- c("Orange","Orange","Pineapple","Pineapple","Orange","Orange","Blueberry")
Location <- c(10, 11, 15, 16, 10, 11, 30)
Entry <- c(1, 1, 2, 2, 3, 3, 4)

Note how the second entry of "Orange" receives a different ID to the first, even though it is added at the same Location. My thought is to write a loop that would iterate over Fruits for a change in text, placing a value in Entry. All values in Entry must be consecutive. This seems a simple exercise but I am stuck!

Thank you.

Please post a **[reproducible code snippet](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)**. Use `dput()` and give us just a snippet of your dataframe. — smci, Apr 27 '15 at 01:32

score 2 · Accepted Answer · answered Apr 27 '15 at 02:02

2

This is a typical rle problem, in which you can get what you are looking for by expanding the lengths value from the result of rle:

> A <- rle(Fruits)
> rep(seq_along(A$lengths), A$lengths)
[1] 1 1 2 2 3 3 4

answered Apr 27 '15 at 02:02

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

2

Similarly, `cumsum` can do it - `cumsum(Fruits != c(".NOTHING.",head(Fruits,-1)))` – thelatemail Apr 27 '15 at 02:18

score 0 · Answer 2 · answered Apr 27 '15 at 01:34

0

table(df$Fruits)

is what you want to get the frequency distribution of the number of fruits within the dataset. If you want distribution by fruit and location, then tell us that.

answered Apr 27 '15 at 01:34

smci

32,567
20
113
146

@TimBiegeleisen My dataset is 344156 obs. of 2 variables, the above is an abbreviated version for example purposes. I used fruits as my data contains confidential names. I wish to... a) Create a frequency distribution of the number of fruits within the dataset. Simply counting the occurence of different fruits (for example, the number of times orange appears) is misleading. b) My main aim is to count the interaction between fruits... what fruit appears after another fruit at each location. Hence the need for the different ID. – user2716568 Apr 27 '15 at 01:39

score 0 · Answer 3 · edited May 23 '17 at 11:51

0

Using @thelatemail's advice and an answer from... Creating a column in r that auto-increments based on other columns, I utilised the following code:

indx <- as.character(interaction(Analysis[c(1)]))
Analysis$Entry <- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))

Where Analysis is my data.frame and Fruits is the first column.

edited May 23 '17 at 11:51

Community

1
1

answered Apr 27 '15 at 03:13

user2716568

1,866
3
23
38

Add new column based on changes in text in another column

3 Answers3