389

How would one change this input (with the sequence: time, in, out, files):

Time   In    Out  Files
1      2     3    4
2      3     4    5

To this output (with the sequence: time, out, in, files)?

Time   Out   In  Files
1      3     2    4
2      4     3    5

Here's the dummy R data:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5
zx8754
  • 52,746
  • 12
  • 114
  • 209
Catherine
  • 5,345
  • 11
  • 30
  • 28
  • 4
    `help(Extract)` also known as `?'['` – Joris Meys Apr 11 '11 at 12:03
  • 3
    In addition to @Joris's suggesting, Try reading sections 2.7 and section 5 of the "An Introduction to R" manual: http://cran.r-project.org/doc/manuals/R-intro.html – Gavin Simpson Apr 11 '11 at 12:06
  • 5
    One additional issue: all the answers require the full list of columns, otherwise they result in subsetting. What if we only want to list a few columns to be ordered as the first ones, but also retaining all the others? – 000andy8484 May 24 '16 at 06:47

12 Answers12

408

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.

To change the order as in the above question do df2[,c(1,3,2,4)]

If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")

Braiam
  • 1
  • 11
  • 47
  • 78
richiemorrisroe
  • 9,307
  • 3
  • 22
  • 20
  • 46
    This is ok when you have a limited number of columns, but what if you have for example 50 columns, it would take too much time to type all column numbers or names. What would be a quicker solution? – Herman Toothrot Aug 30 '13 at 12:01
  • 70
    @user4050: in that case you can use the ":" syntax, e.g. df[,c(1,3,2,4,5:50)]. – dalloliogm Feb 25 '14 at 12:20
  • 1
    to put the columns in idcols at the start: idcols <- c("name", "id2", "start", "duration"); cols <- c(idcols, names(cts)[-which(names(cts) %in% idcols)]); df <- df[cols] – kasterma Jun 10 '14 at 12:49
  • 17
    @user4050: you can also use `df[,c(1,3,2,4:ncol(df))]` when you don't know how many columns there are. – arekolek Mar 15 '16 at 14:28
  • @user4050 [This answer](http://stackoverflow.com/questions/5620885/how-does-one-reorder-columns-in-a-data-frame/37009127#37009127) proposes a solution that should be more convenient (and less error-prone) when dealing with large numbers of columns. It allows to specify the desired position of chosen variables, and not worry about the remaining variables, which will automatically be slotted in the remaining positions. – landroni May 09 '16 at 08:57
  • 1
    You can also use dput(colnames(df)), it prints column names in R character format. You can then rearrange the names. – Chris Jul 27 '16 at 09:14
  • @landroni that is a really good answer. It's a little verbose (I would pre-filter at the repl and use that), and in general, I think that ```df` > names(.) > grep "some_col_name_pattern >> df(names %in% .)"``` (untested) is more elegant. But nonetheless, your answer is more general (but more obscure) so thank you for making this answer better :) – richiemorrisroe Feb 11 '17 at 22:50
  • @richiemorrisroe Thanks for the feedback. I've now simplified slightly the answer which should make it more readable. – landroni Feb 12 '17 at 22:49
  • 1
    Herman - if you've got 50 columns and you want to custom reorder them, use a helper csv file with a new column order, e.g. `name_df$new_order` (which you could construct by `write_csv(data.frame(old_order = names(df), "name_df.csv"))`. Then mess with the order out of R and read it back in. Now you can `df_reordered = df[, name_df$new_order]`. Referencing columns by position number doesn't scale well as the number of columns goes up. – Mike Dolan Fliss Oct 27 '18 at 15:46
  • @herman_toothrot: for the example of 50 columns, if one wants to use names and base R it could be done as `subset(df, select = c(one, three, two, four:fifty))`. Similar to @dalloliogm comment but with the names of the columns (without quotes) instead of the number of the columns. The use of `subset()` in coding is discouraged, though. See – jorvaor Jul 19 '23 at 08:58
210
# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows

#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows
salac33
  • 3
  • 2
Xavier Guardiola
  • 2,699
  • 3
  • 19
  • 11
  • 1
    Question as a beginner, can you combine ordering by index and by name? E.g. `data <- data[c(1,3,"Var1", 2)]`? – Bram Vanroy Dec 21 '14 at 15:56
  • 9
    @BramVanroy nope, `c(1,3,"Var1", 2)` will be read as `c("1","3","Var1", "2")` because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the *character* names "1", "3", etc. you'll get "undefined columns". `list(1,3,"Var1", 2)` keeps values without type promotion, but you can't use a `list` in the above context. – Terry Brown Jan 18 '15 at 16:05
  • 2
    Why does the `mtcars[c(1,3,2)]` subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be `mtcars[,c(1,3,2)]`? – landroni Aug 30 '15 at 13:07
  • 1
    data.frames are lists under the hood with columns as first order items – petermeissner Nov 24 '15 at 10:31
124

You can also use the subset function:

data <- subset(data, select=c(3,2,1))

You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.

Update:

You can also use the select function from the dplyr package:

data = data %>% select(Time, out, In, Files)

I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:

mtcars %>% select(carb:mpg)

And the following will reorder only some columns, and discard others:

mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))

Read more about dplyr's select syntax.

guyabel
  • 8,014
  • 6
  • 57
  • 86
dalloliogm
  • 8,718
  • 6
  • 45
  • 55
  • 6
    There are some reasons not to use `subset()`, see [this question](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset). – MERose Nov 16 '14 at 23:56
  • 2
    Thank you. In any case I would now use the select function from the dplyr package, instead of subset. – dalloliogm Nov 18 '14 at 14:14
  • 112
    When you want to bring a couple of columns to the left hand side and not drop the others, I find `everything()` particularly awesome; `mtcars %>% select(wt, gear, everything())` – guyabel Feb 19 '15 at 10:32
  • 2
    Here is another way to use the everything() select_helper function to rearrange the columns to the right/end. https://stackoverflow.com/a/44353144/4663008 https://github.com/tidyverse/dplyr/issues/2838 Seems like you will need to use 2 select()'s to move some columns to the right end and others to the left. – Arthur Yip Jun 05 '17 at 04:21
  • 4
    new function dplyr::relocate is exactly for this. see H 1 's answer below – Arthur Yip Apr 20 '20 at 07:03
49

As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame are generally cumbersome and error-prone, especially if you have a lot of columns.

This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.

##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
    ##stop if not a data.frame (but should work for matrices as well)
    stopifnot(is.data.frame(data))

    ##sort out inputs
    data.nms <- names(data)
    var.nr <- length(data.nms)
    var.nms <- names(vars)
    var.pos <- vars
    ##sanity checks
    stopifnot( !any(duplicated(var.nms)), 
               !any(duplicated(var.pos)) )
    stopifnot( is.character(var.nms), 
               is.numeric(var.pos) )
    stopifnot( all(var.nms %in% data.nms) )
    stopifnot( all(var.pos > 0), 
               all(var.pos <= var.nr) )

    ##prepare output
    out.vec <- character(var.nr)
    out.vec[var.pos] <- var.nms
    out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
    stopifnot( length(out.vec)==var.nr )

    ##re-arrange vars by position
    data <- data[ , out.vec]
    return(data)
}

Now the OP's request becomes as simple as this:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

arrange.vars(table, c("Out"=2))
##  Time Out In Files
##1    1   3  2     4
##2    2   4  3     5

To additionally swap Time and Files columns you can do this:

arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
##  Files Out In Time
##1     4   3  2    1
##2     5   4  3    2
Community
  • 1
  • 1
landroni
  • 2,902
  • 1
  • 32
  • 39
  • Very nice function. I added a modified version of this function to my [personal package](https://github.com/Deleetdk/kirkegaard). – CoderGuy123 Jul 06 '16 at 12:12
  • 3
    This is really useful - it's going to save me a lot of time when I just want to move one column from the end of a really wide tibble to the beginning – Mrmoleje May 20 '19 at 13:48
43

A dplyr solution (part of the tidyverse package set) is to use select:

select(table, "Time", "Out", "In", "Files") 

# or

select(table, Time, Out, In, Files)
David Tonhofer
  • 14,559
  • 5
  • 55
  • 51
Ben G
  • 4,148
  • 2
  • 22
  • 42
  • 3
    The best option for me. Even if I had to install it, it is clearly the clearest possibility. – Garini Jun 18 '18 at 15:41
  • 22
    Tidyverse (dplyr in fact) also has the option to select groups of columns, for example to move the Species variable to the front: `select(iris, Species, everything())`. Also note that quotes are not needed. – Paul Rougieux Aug 16 '18 at 12:34
  • 5
    It's important to note that this will drop all columns which are not explicitly specified unless you include `everything()` as in PaulRougieux's comment – divibisan Mar 21 '19 at 16:40
  • `dplyr`'s `group` will also rearrange the variables, so watch out when using that in a chain. – David Tonhofer Oct 18 '19 at 11:15
  • As of `dplyr` version `1.0.0` they added a `relocate()` function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column. – otteheng Dec 16 '20 at 20:49
40

dplyr version 1.0.0 includes the relocate() function to easily reorder columns:

dat <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

library(dplyr) # from version 1.0.0 only

dat %>%
  relocate(Out, .before = In)

or

dat %>%
  relocate(Out, .after = Time)
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
27

Maybe it's a coincidence that the column order you want happens to have column names in descending alphabetical order. Since that's the case you could just do:

df<-df[,order(colnames(df),decreasing=TRUE)]

That's what I use when I have large files with many columns.

user3482899
  • 320
  • 3
  • 10
  • `!! WARNING !!` `data.table` turns `TARGET` into an int vector: `TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]` to fix that: `TARGET <- as.data.frame(TARGET)` `TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]` – Zachary Ryan Smith Aug 10 '18 at 00:51
20

You can use the data.table package:

How to reorder data.table columns (without copying)

require(data.table)
setcolorder(DT,myOrder)
andschar
  • 3,504
  • 2
  • 27
  • 35
usct01
  • 838
  • 7
  • 18
19

The three top-rated answers have a weakness.

If your dataframe looks like this

df <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))

> df
  Time In Out Files
1    1  2   3     4
2    2  3   4     5

then it's a poor solution to use

> df2[,c(1,3,2,4)]

It does the job, but you have just introduced a dependence on the order of the columns in your input.

This style of brittle programming is to be avoided.

The explicit naming of the columns is a better solution

data[,c("Time", "Out", "In", "Files")]

Plus, if you intend to reuse your code in a more general setting, you can simply

out.column.name <- "Out"
in.column.name <- "In"
data[,c("Time", out.column.name, in.column.name, "Files")]

which is also quite nice because it fully isolates literals. By contrast, if you use dplyr's select

data <- data %>% select(Time, out, In, Files)

then you'd be setting up those who will read your code later, yourself included, for a bit of a deception. The column names are being used as literals without appearing in the code as such.

Vrokipal
  • 784
  • 5
  • 18
3
data.table::setcolorder(table, c("Out", "in", "files"))
Hossein Noorazar
  • 124
  • 1
  • 10
3

Dplyr has a function that allows you to move specific columns to before or after other columns. That is a critical tool when you work with big data frameworks (if it is 4 columns, it's faster to use select as mentioned before).

https://dplyr.tidyverse.org/reference/relocate.html

In your case, it would be:

df <- df %>% relocate(Out, .after = In)

Simple and elegant. It also allows you to move several columns together and move it to the beginning or to the end:

df <- df %>% relocate(any_of(c('ColX', 'ColY', 'ColZ')), .after = last_col())

Again: super powerful when you work with big dataframes :)

Pau
  • 61
  • 5
1

The only one I have seen work well is from here.

 shuffle_columns <- function (invec, movecommand) {
      movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
                                 ",|\\s+"), function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first",
                              "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp) - 1
      }
      else if (A == "after") {
        after <- match(ba, temp)
      }
    }
    else if (A == "first") {
      after <- 0
    }
    else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  myVec
}

Use like this:

new_df <- iris[shuffle_columns(names(iris), "Sepal.Width before Sepal.Length")]

Works like a charm.

Cybernetic
  • 12,628
  • 16
  • 93
  • 132