0

I have a data frame in R with two columns temp and timeStamp. The data has temp values regularly. A portion of dataframe looks like-

enter image description here

I have to create line chart showing changes in temp over time. As can be seen here, temp values remain the same for several timeStamp. Having these repeating value increases the size of data file and I want to remove them. So the output should look like this-

enter image description here

Showing just the values where there is a change. Cannot think of a way to get this think done in R. Any inputs in the right direction would be really helpful.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Zeeshan
  • 1,248
  • 1
  • 12
  • 19

2 Answers2

1

One option would be using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'temp', we subset the first and last observation (.SD[c(1L, .N)]) per each group. If there is only a single value per group, we take the row as such (else .SD).

library(data.table)
setDT(df1)[, if(.N>1) .SD[c(1L, .N)] else .SD, by =temp]
#    temp val
#1: 22.50   1
#2: 22.50   4
#3: 22.37   5
#4: 22.42   6
#5: 22.42   7

Or a base R option with duplicated. We check the duplicated values in 'temp' (output is a logical vector), and also check the duplication from the reverse side (fromLast=TRUE). Use & to find the elements that are TRUE in both cases, negate (!) and subset the rows of 'df1'.

df1[!(duplicated(df1$temp) & duplicated(df1$temp,fromLast=TRUE)),]
#   temp val
#1 22.50   1
#4 22.50   4
#5 22.37   5
#6 22.42   6
#7 22.42   7

data

df1 <- data.frame(temp=c(22.5, 22.5, 22.5, 22.5, 22.37,22.42, 22.42), val=1:7)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your answer. If I group by temp, there are some temp values that are same as in the group but occurring at a later time (after some variation in temp), so this would not be helpful. I am therefore running a loop over all values and selecting only those values where there is a change in temp into a new data frame. Thanks for your response :) – Zeeshan Aug 05 '15 at 14:00
  • 1
    @ZeeshanAKhan It was based on the example you showed. If it is not the expected output, you should change your example and expected output. Also, consider to post a dput output instead of an image – akrun Aug 05 '15 at 14:02
  • 1
    If you create an id column and group by id and temp, they will then be unique throughout. – Pierre L Aug 05 '15 at 14:11
1

Here's a dplyr solution:

# Toy data
df <- data.frame(time = seq(20), temp = c(rep(60, 5), rep(61, 7), rep(59, 3), rep(60, 5)))

# Now filter for the first and last rows and ones bracketing a temperature change
df %>% filter(temp!=lag(temp) | temp!=lead(temp) | time==min(time) | time==max(time))

  time temp
1    1   60
2    5   60
3    6   61
4   12   61
5   13   59
6   15   59
7   16   60
8   20   60

If the data are grouped by a third column (id), just add group_by(id) %>% before the filtering step.

ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • Thanks for your answer. If I group by temp, there are some temp values that are same as in the group but occurring at a later time (after some variation in temp), so this would not be helpful. I am therefore running a loop over all values and selecting only those values where there is a change in temp into a new data frame. Thanks for your response :) – Zeeshan Aug 05 '15 at 14:00
  • I don't understand what you're saying. You don't need to group by temp. – ulfelder Aug 05 '15 at 14:03
  • A separate id column can help with that – Pierre L Aug 05 '15 at 14:10