1

It might seem like a duplicate of to these two questions but can't find a solution that does not involve removing the columns, concatenating, then re-inserting them back into the data frame (which is a long winded way i've only managed to produce).

I would like to remove the "dht_t" column and combine its non-NA or "None" rows it with "int_t". these dht columns contain the same data as int_t, but at different timestamps. I want to combine the rows.

dht_t/h are coming from the same sensors as int_t/h, but are put in additional rows for some of dataset.

 head(july11)
                 time hive_id int_t int_h dht_t dht_h     hz     db     pa
1 2015-07-11 00:00:01   hive1  25.3  50.1  None  None 136.72 39.443 100849
2 2015-07-11 00:00:22   hive1  25.3  50.3  None  None    NaN 39.108 100846
3 2015-07-11 00:00:43   hive1  25.3  50.3  None  None    NaN 39.451 100835
4 2015-07-11 00:01:04   hive1  25.3  50.3  None  None    NaN 39.145 100849
5 2015-07-11 00:01:25   hive1  25.3  50.3  None  None    NaN 39.357 100844
6 2015-07-11 00:01:46   hive1  25.3  50.7  None  None    NaN 39.284 100843

this is asection of the data that where the dht_t/h values should be moved to int_t/h columns

this is asection of the data that where the dht_t/h values should be moved to int_t/h columns.

and have an output without dht_t and dht_h

               time hive_id int_t int_h    hz     db     pa
1 2015-07-11 00:00:01   hive1  25.3  50.1  136.72 39.443 100849
2 2015-07-11 00:00:22   hive1  25.3  50.3  NaN    39.108 100846
3 2015-07-11 00:00:43   hive1  25.3  50.3  NaN    39.451 100835
4 2015-07-11 00:01:04   hive1  25.3  50.3  NaN    39.145 100849
5 2015-07-11 00:01:25   hive1  25.3  50.3  NaN    39.357 100844
6 2015-07-11 00:01:46   hive1  25.3  50.7  NaN    39.284 100843
Community
  • 1
  • 1
Evan
  • 217
  • 2
  • 13
  • 5
    Can you also post the expected output? In the example you have all the rows as `None` for `dht_t`. Try `july11$dht_t <- with(july11, ifelse(is.na(dht_t)|dht_t=='None', int_t, dht_t))` – akrun Jul 15 '15 at 14:18
  • yes, repsosted with edits, thanks for suggestion. – Evan Jul 15 '15 at 14:24
  • to clarify, dht_t/h. all other hives do not. – Evan Jul 15 '15 at 14:28
  • I think based on the image showed it must the `ifelse` solution i.e. `july11$int_t <- with(july, ifelse(is.na(int_t)|int_t=='None', dht_t, int_t)` and similarly for the `int_h` column. This can be looped using `Map` though. – akrun Jul 15 '15 at 14:47
  • the code in first comment transferred incorrect values into the dht_t column, which is the column to be deleted. and trying it for july11$int_t also resulted in weird values – Evan Jul 15 '15 at 14:47
  • Yes, it should be the opposite way. Added another version. Hope it helps. After the `ifelse` step is completed, you can remove the `dht` columns with `grep` – akrun Jul 15 '15 at 14:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/83344/discussion-between-evan-and-akrun). – Evan Jul 15 '15 at 14:49

1 Answers1

1

We can use ifelse to replace the values in one column with another. There are other ways to do it. But, this is easy to understand. Create an index ('indx') of the columns that are used for replacement using grep.

indx <- grep('^(int|dht)', names(july11))

As the columns are 'factor' (from OP's comments), we can convert the selected columns in a loop (lapply) to 'numeric'. The non-numeric elements will be coerced to NAs.

july11[indx] <- lapply(july11[indx], function(x) as.numeric(as.character(x)))

We replace the NA values in 'int_t/int_h' with the values in 'dht_t/dht_h'

july11$int_t <- with(july11, ifelse(is.na(int_t), dht_t, int_t))
july11$int_h <- with(july11, ifelse(is.na(int_h), dht_h, int_h))

and remove the 'dht' columns from the dataset.

july11N <- july11[-grep('^dht', colnames(july11))] 
akrun
  • 874,273
  • 37
  • 540
  • 662