How does one reorder columns in a data frame?

Question

How would one change this input (with the sequence: time, in, out, files):

Time   In    Out  Files
1      2     3    4
2      3     4    5

To this output (with the sequence: time, out, in, files)?

Time   Out   In  Files
1      3     2    4
2      4     3    5

Here's the dummy R data:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

In addition to @Joris's suggesting, Try reading sections 2.7 and section 5 of the "An Introduction to R" manual: http://cran.r-project.org/doc/manuals/R-intro.html — Gavin Simpson, Apr 11 '11 at 12:06
One additional issue: all the answers require the full list of columns, otherwise they result in subsetting. What if we only want to list a few columns to be ordered as the first ones, but also retaining all the others? — 000andy8484, May 24 '16 at 06:47

score 408 · Answer 1 · edited Apr 29 '16 at 16:00

408

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.

To change the order as in the above question do df2[,c(1,3,2,4)]

If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")

edited Apr 29 '16 at 16:00

Braiam

1
11
47
78

answered Sep 21 '11 at 07:42

richiemorrisroe

9,307
3
22
20

46

This is ok when you have a limited number of columns, but what if you have for example 50 columns, it would take too much time to type all column numbers or names. What would be a quicker solution? – Herman Toothrot Aug 30 '13 at 12:01
70

@user4050: in that case you can use the ":" syntax, e.g. df[,c(1,3,2,4,5:50)]. – dalloliogm Feb 25 '14 at 12:20
1

to put the columns in idcols at the start: idcols <- c("name", "id2", "start", "duration"); cols <- c(idcols, names(cts)[-which(names(cts) %in% idcols)]); df <- df[cols] – kasterma Jun 10 '14 at 12:49
17

@user4050: you can also use `df[,c(1,3,2,4:ncol(df))]` when you don't know how many columns there are. – arekolek Mar 15 '16 at 14:28
@user4050 [This answer](http://stackoverflow.com/questions/5620885/how-does-one-reorder-columns-in-a-data-frame/37009127#37009127) proposes a solution that should be more convenient (and less error-prone) when dealing with large numbers of columns. It allows to specify the desired position of chosen variables, and not worry about the remaining variables, which will automatically be slotted in the remaining positions. – landroni May 09 '16 at 08:57
1

You can also use dput(colnames(df)), it prints column names in R character format. You can then rearrange the names. – Chris Jul 27 '16 at 09:14
@landroni that is a really good answer. It's a little verbose (I would pre-filter at the repl and use that), and in general, I think that ```df` > names(.) > grep "some_col_name_pattern >> df(names %in% .)"``` (untested) is more elegant. But nonetheless, your answer is more general (but more obscure) so thank you for making this answer better :) – richiemorrisroe Feb 11 '17 at 22:50
@richiemorrisroe Thanks for the feedback. I've now simplified slightly the answer which should make it more readable. – landroni Feb 12 '17 at 22:49
1

Herman - if you've got 50 columns and you want to custom reorder them, use a helper csv file with a new column order, e.g. `name_df$new_order` (which you could construct by `write_csv(data.frame(old_order = names(df), "name_df.csv"))`. Then mess with the order out of R and read it back in. Now you can `df_reordered = df[, name_df$new_order]`. Referencing columns by position number doesn't scale well as the number of columns goes up. – Mike Dolan Fliss Oct 27 '18 at 15:46
@herman_toothrot: for the example of 50 columns, if one wants to use names and base R it could be done as `subset(df, select = c(one, three, two, four:fifty))`. Similar to @dalloliogm comment but with the names of the columns (without quotes) instead of the number of the columns. The use of `subset()` in coding is discouraged, though. See – jorvaor Jul 19 '23 at 08:58

score 210 · Answer 2 · edited Sep 30 '21 at 08:00

210

# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows

#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows

edited Sep 30 '21 at 08:00

salac33

3
2

answered Jun 22 '12 at 14:28

Xavier Guardiola

2,699
3
19
11

1

Question as a beginner, can you combine ordering by index and by name? E.g. `data <- data[c(1,3,"Var1", 2)]`? – Bram Vanroy Dec 21 '14 at 15:56
9

@BramVanroy nope, `c(1,3,"Var1", 2)` will be read as `c("1","3","Var1", "2")` because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the *character* names "1", "3", etc. you'll get "undefined columns". `list(1,3,"Var1", 2)` keeps values without type promotion, but you can't use a `list` in the above context. – Terry Brown Jan 18 '15 at 16:05
2

Why does the `mtcars[c(1,3,2)]` subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be `mtcars[,c(1,3,2)]`? – landroni Aug 30 '15 at 13:07
1

data.frames are lists under the hood with columns as first order items – petermeissner Nov 24 '15 at 10:31

score 124 · Answer 3 · edited Feb 19 '15 at 10:53

124

You can also use the subset function:

data <- subset(data, select=c(3,2,1))

You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.

Update:

You can also use the select function from the dplyr package:

data = data %>% select(Time, out, In, Files)

I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:

mtcars %>% select(carb:mpg)

And the following will reorder only some columns, and discard others:

mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))

Read more about dplyr's select syntax.

edited Feb 19 '15 at 10:53

guyabel

8,014
6
57
86

answered Jul 03 '12 at 13:20

dalloliogm

8,718
6
45
55

6

There are some reasons not to use `subset()`, see [this question](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset). – MERose Nov 16 '14 at 23:56
2

Thank you. In any case I would now use the select function from the dplyr package, instead of subset. – dalloliogm Nov 18 '14 at 14:14
112

When you want to bring a couple of columns to the left hand side and not drop the others, I find `everything()` particularly awesome; `mtcars %>% select(wt, gear, everything())` – guyabel Feb 19 '15 at 10:32
2

Here is another way to use the everything() select_helper function to rearrange the columns to the right/end. https://stackoverflow.com/a/44353144/4663008 https://github.com/tidyverse/dplyr/issues/2838 Seems like you will need to use 2 select()'s to move some columns to the right end and others to the left. – Arthur Yip Jun 05 '17 at 04:21
4

new function dplyr::relocate is exactly for this. see H 1 's answer below – Arthur Yip Apr 20 '20 at 07:03

score 49 · Answer 4 · edited May 23 '17 at 12:26

As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame are generally cumbersome and error-prone, especially if you have a lot of columns.

This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.

##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
    ##stop if not a data.frame (but should work for matrices as well)
    stopifnot(is.data.frame(data))

    ##sort out inputs
    data.nms <- names(data)
    var.nr <- length(data.nms)
    var.nms <- names(vars)
    var.pos <- vars
    ##sanity checks
    stopifnot( !any(duplicated(var.nms)), 
               !any(duplicated(var.pos)) )
    stopifnot( is.character(var.nms), 
               is.numeric(var.pos) )
    stopifnot( all(var.nms %in% data.nms) )
    stopifnot( all(var.pos > 0), 
               all(var.pos <= var.nr) )

    ##prepare output
    out.vec <- character(var.nr)
    out.vec[var.pos] <- var.nms
    out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
    stopifnot( length(out.vec)==var.nr )

    ##re-arrange vars by position
    data <- data[ , out.vec]
    return(data)
}

Now the OP's request becomes as simple as this:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

arrange.vars(table, c("Out"=2))
##  Time Out In Files
##1    1   3  2     4
##2    2   4  3     5

To additionally swap Time and Files columns you can do this:

arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
##  Files Out In Time
##1     4   3  2    1
##2     5   4  3    2

Very nice function. I added a modified version of this function to my [personal package](https://github.com/Deleetdk/kirkegaard). — CoderGuy123, Jul 06 '16 at 12:12
This is really useful - it's going to save me a lot of time when I just want to move one column from the end of a really wide tibble to the beginning — Mrmoleje, May 20 '19 at 13:48

score 43 · Answer 5 · edited Oct 18 '19 at 11:03

43

A dplyr solution (part of the tidyverse package set) is to use select:

select(table, "Time", "Out", "In", "Files") 

# or

select(table, Time, Out, In, Files)

edited Oct 18 '19 at 11:03

David Tonhofer

14,559
5
55
51

answered Jun 07 '18 at 16:55

Ben G

4,148
2
22
42

3

The best option for me. Even if I had to install it, it is clearly the clearest possibility. – Garini Jun 18 '18 at 15:41
22

Tidyverse (dplyr in fact) also has the option to select groups of columns, for example to move the Species variable to the front: `select(iris, Species, everything())`. Also note that quotes are not needed. – Paul Rougieux Aug 16 '18 at 12:34
5

It's important to note that this will drop all columns which are not explicitly specified unless you include `everything()` as in PaulRougieux's comment – divibisan Mar 21 '19 at 16:40
`dplyr`'s `group` will also rearrange the variables, so watch out when using that in a chain. – David Tonhofer Oct 18 '19 at 11:15
As of `dplyr` version `1.0.0` they added a `relocate()` function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column. – otteheng Dec 16 '20 at 20:49

score 40 · Answer 6 · answered Apr 15 '20 at 14:18

40

dplyr version 1.0.0 includes the relocate() function to easily reorder columns:

dat <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

library(dplyr) # from version 1.0.0 only

dat %>%
  relocate(Out, .before = In)

or

dat %>%
  relocate(Out, .after = Time)

answered Apr 15 '20 at 14:18

Ritchie Sacramento

29,890
4
48
56

2

That's a very neat solution. Thanks! – Sandy Jul 05 '21 at 08:53
2

This is probably the most flexible and simple solution. Thanks! – Dominique Paul May 27 '22 at 10:56

score 27 · Answer 7 · answered Feb 25 '15 at 16:44

27

Maybe it's a coincidence that the column order you want happens to have column names in descending alphabetical order. Since that's the case you could just do:

df<-df[,order(colnames(df),decreasing=TRUE)]

That's what I use when I have large files with many columns.

answered Feb 25 '15 at 16:44

user3482899

320
3
10

`!! WARNING !!` `data.table` turns `TARGET` into an int vector: `TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]` to fix that: `TARGET <- as.data.frame(TARGET)` `TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]` – Zachary Ryan Smith Aug 10 '18 at 00:51

score 20 · Answer 8 · edited May 06 '20 at 10:42

20

You can use the data.table package:

How to reorder data.table columns (without copying)

require(data.table)
setcolorder(DT,myOrder)

edited May 06 '20 at 10:42

andschar

3,504
2
27
35

answered May 20 '15 at 11:23

usct01

838
7
18

Vrokipal · Answer 9 · 2018-06-19T13:13:19.917

The three top-rated answers have a weakness.

If your dataframe looks like this

df <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

> df
  Time In Out Files
1    1  2   3     4
2    2  3   4     5

then it's a poor solution to use

> df2[,c(1,3,2,4)]

It does the job, but you have just introduced a dependence on the order of the columns in your input.

This style of brittle programming is to be avoided.

The explicit naming of the columns is a better solution

data[,c("Time", "Out", "In", "Files")]

Plus, if you intend to reuse your code in a more general setting, you can simply

out.column.name <- "Out"
in.column.name <- "In"
data[,c("Time", out.column.name, in.column.name, "Files")]

which is also quite nice because it fully isolates literals. By contrast, if you use dplyr's select

data <- data %>% select(Time, out, In, Files)

then you'd be setting up those who will read your code later, yourself included, for a bit of a deception. The column names are being used as literals without appearing in the code as such.

Hossein Noorazar · Answer 10 · 2019-06-16T23:49:07.893

3

data.table::setcolorder(table, c("Out", "in", "files"))

edited Jun 16 '19 at 23:49

answered Mar 25 '19 at 03:23

Hossein Noorazar

124
1
10

pls state the library you take the function `setcolorder` from. – Triamus Jun 14 '19 at 09:44

score 3 · Answer 11 · answered Jun 14 '22 at 07:09

Dplyr has a function that allows you to move specific columns to before or after other columns. That is a critical tool when you work with big data frameworks (if it is 4 columns, it's faster to use select as mentioned before).

https://dplyr.tidyverse.org/reference/relocate.html

In your case, it would be:

df <- df %>% relocate(Out, .after = In)

Simple and elegant. It also allows you to move several columns together and move it to the beginning or to the end:

df <- df %>% relocate(any_of(c('ColX', 'ColY', 'ColZ')), .after = last_col())

Again: super powerful when you work with big dataframes :)

Cybernetic · Answer 12 · 2018-01-03T16:44:44.077

The only one I have seen work well is from here.

 shuffle_columns <- function (invec, movecommand) {
      movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
                                 ",|\\s+"), function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first",
                              "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp) - 1
      }
      else if (A == "after") {
        after <- match(ba, temp)
      }
    }
    else if (A == "first") {
      after <- 0
    }
    else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  myVec
}

Use like this:

new_df <- iris[shuffle_columns(names(iris), "Sepal.Width before Sepal.Length")]

Works like a charm.

How does one reorder columns in a data frame?

12 Answers12

Linked

Related