0

I have a dataframe with a column named Stage. The dataframe is generated from a regularly updated excel file.

This column should only have a certain few values in it, such as 'Planning', or 'Analysis', but people occasionally put custom values in and it is impractical to stop.

I want the dataframe sorted by this column, with a custom sort order that makes sense chronologically (e.g for us, planning comes before analysis). I would be able to implement this using factors (e.g. Reorder rows using custom order ), but if I use a predefined list of factors, I lose any unexpected values that people enter into that column. I am happy for the unexpected values not to be sorted properly but I don't want to lose them entirely.

EDIT: Answer by floo0 is amazing, but I neglected to mention that I was planning on barplotting the results, something like

barplot(table(MESH_assurance_involved()[MESH_assurance_invol‌​ved_sort_order(), 'Stage']), main="Stage became involved")

(parentheses because these are shiny reactive objects, shouldn't make a difference).

The results are unsorted, although testing in the console reveals the underlying data is sorted.

table is also breaking the sorting but using ggplot and no table I get the identical result.

To display a barplot maintaining the source order seems to require something like Ordering bars in barplot() but all solutions I have found require factors, and mixing them with the solution here is not working for me somehow.

Community
  • 1
  • 1
anotherfred
  • 1,330
  • 19
  • 25

1 Answers1

3

Toy data-set:

dat <- data.frame(Stage = c('random1', 'Planning', 'Analysis', 'random2'), id=1:4,
                  stringsAsFactors = FALSE)

So dat looks as follows:

> dat
     Stage id
1  random1  1
2 Planning  2
3 Analysis  3
4  random2  4

Now you can do something like this:

known_levels <- c('Planning', 'Analysis')
my_order <- order(factor(dat$Stage, levels = known_levels, ordered=TRUE))
dat[my_order, ]

Which gives you

     Stage id
2 Planning  2
3 Analysis  3
1  random1  1
4  random2  4
Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • Thank you. I got my data sorted but when I barplot it, the data is not displayed sorted `barplot(table(MESH_assurance_involved()[MESH_assurance_involved_sort_order(), 'Stage']), main="Stage became involved")` (parentheses because shiny). I know that to get a sorted barplot you need some tricks like [Ordering bars in barplot()](http://stackoverflow.com/questions/19681586/ordering-bars-in-barplot) but I don't understand how to get that working for me: I understand how your code works but factoring the data, and the data sort order too, at the same time, seems beyond me. – anotherfred Nov 24 '16 at 11:09