use first row data as column names in r

Question

I have a dirty dataset that I could not read it with header = T. After I read and clean it, I would like to use the now first row data as the column name. I tried multiple methods on Stack Overflow without success. What could be the problem?

The dataset t1 should look like this after clean up:

      V1    V2  V3  V4  V5
1   col1    col2    col3    col4
2   row1    2   4   5   56
3   row2    74  74  3   534
4   row3    865 768 8   7
5   row4    68  86  65  87

I tried: colnames(t1) <- t1[1,]. Nothing happens.
I tried: names(t1) <- ti[1,], Nothing happens.
I tried: lapply(t1, function(x) {names(x) <- x[1, ]; x}). It returns an error message:
```
Error in `[.default`(x, 1, ) : incorrect number of dimensions
```

Could anyone help?

Looking at your data, do you have blanks in some columns? try str(t1[1,]) and see if it's doing what you expect. — MikeRSpencer, Aug 17 '15 at 16:08
Does this answer your question? [Row to colnames](https://stackoverflow.com/questions/44031720/row-to-colnames) — Sam Firke, Oct 31 '19 at 02:40

score 68 · Answer 1 · answered Dec 12 '19 at 13:01

Sam Firke's ever useful package janitor has a function especially for this: row_to_names.

Example from his documentation:

library(janitor)

x <- data.frame(X_1 = c(NA, "Title", 1:3),
           X_2 = c(NA, "Title2", 4:6))
x %>%
  row_to_names(row_number = 2)

Pierre L · Answer 2 · 2015-08-17T15:46:23.977

27

header.true <- function(df) {
  names(df) <- as.character(unlist(df[1,]))
  df[-1,]
}

Test

df1 <- data.frame(c("a", 1,2,3), c("b", 4,5,6))
header.true(df1)
  a b
2 1 4
3 2 5
4 3 6

edited Aug 17 '15 at 15:46

answered Aug 17 '15 at 15:40

Pierre L

28,203
6
47
69

life saver little fucntion ..every time i have to see the V1 issues not sure why – PesKchan Apr 10 '21 at 15:13

mpalanco · Answer 3 · 2015-08-17T18:12:55.117

Probably, the data type of the data frame columns are factors. That is why the code you tried didn't work, you can check it using str(df):

First option

Use the argument stringsAsFactors = FALSEwhen you import your data:

df <- read.table(text =  "V1    V2  V3  V4  V5
                        col1    col2    col3    col4 col5
                        row1    2   4   5   56
                        row2    74  74  3   534
                        row3    865 768 8   7
                        row4    68  86  65  87", header = TRUE, 
                        stringsAsFactors = FALSE )

Then you can use your first attempt, then remove your first row if you'd like:

colnames(df) <- df[1,]
df <- df[-1, ]

Second option

It will work if your columns are factors or characters:

names(df) <- lapply(df[1, ], as.character)
df <- df[-1,]

Output:

  col1 col2 col3 col4 col5
2 row1    2    4    5   56
3 row2   74   74    3  534
4 row3  865  768    8    7
5 row4   68   86   65   87

Not sure if relevant, but I had a matrix, and this solution almost worked except I changed names(df) to colnames(df), and it seems to have worked? — Matthew Kozubov, Oct 25 '21 at 16:32

score 10 · Answer 4 · answered Sep 08 '20 at 03:27

While @sbha has already offered a tidyverse solution, I would like to leave a fully pipeable dplyr option. I agree that this should could be an incredibly useful function.

library(dplyr)
data.frame(x = c("a", 1, 2, 3), y = c("b", 4, 5, 6)) %>%
  `colnames<-`(.[1, ]) %>%
  .[-1, ]

mattbawn · Answer 5 · 2015-08-17T16:31:34.640

6

How about:

my.names <- t1[1,]

colnames(t1) <- my.names

i.e. specifically naming the row as a variable?

with the following code:

namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)

t1 <- data.frame(namex, row1, row2, row3, row4)
t1 <- t(t1)

my.names <- t1[1,]

colnames(t1) <- my.names

It seems to work, but maybe I'm missing something?

edited Aug 17 '15 at 16:31

answered Aug 17 '15 at 15:50

mattbawn

1,358
2
13
33

1

yes you are missing two steps, first you need to remove the first row which you are using as column names and convert the `matrix` to `data.frame` – Veerendra Gadekar Aug 17 '15 at 17:37

score 6 · Answer 6 · answered Dec 17 '20 at 12:55

6

You almost did that, only missed calling a vector with c

colnames(t1)=t1[c(1),]

Then you can erase the first row, as now it is doubled

t1=t1[-c(1),]

answered Dec 17 '20 at 12:55

Marcus

61
1
1

1

best solution ever! – anatol May 18 '22 at 05:29

MikeRSpencer · Answer 7 · 2015-08-17T16:21:07.530

5

Take a step back, when you read your data use skip=1 in read.table to miss out the first line entirely. This should make life a bit easier when you're cleaning data, particularly for data type. This is key as your problem stems from your data being encoded as factor.

You can then read in your column names separately with nrows=1 in read.table.

edited Aug 17 '15 at 16:21

answered Aug 17 '15 at 16:11

MikeRSpencer

1,276
10
24

score 3 · Answer 8 · answered Aug 16 '19 at 21:42

3

Similar to some of the other answers, here is a dplyr/tidyverse option:

library(tidyverse)

names(df) <- df %>% slice(1) %>% unlist()
df <- df %>% slice(-1)

answered Aug 16 '19 at 21:42

sbha

9,802
2
74
62

score 1 · Answer 9 · answered Jun 04 '18 at 09:05

Using data.table:

library(data.table)

namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)

t1 <- data.table(namex, row1, row2, row3, row4)
t1 <- data.table(t(t1))

setnames(t1, as.character(t1[1,]))
t1 <- t1[-1,]

score 0 · Answer 10 · answered Jun 16 '21 at 16:26

Building off of Pierre L's answer. Sometimes the first row in a document ends up getting split into two or more rows when pulled into a data frame. This slight modification helped solve that for me.

header.true <- function(df) {
  r1 <- as.character(unlist(df[1,]))
  r2 <- as.character(unlist(df[2,]))
  r1.2 <- paste(r1,r2, sep = ".")
  names(df) <- r1.2
  df[-c(1,2),]
}

Test

df1 <- data.frame(c("a", "xx",1,2,3), c("b", "xx",4,5,6))
header.true(df1)
  a.xx b.xx
3    1    4
4    2    5
5    3    6

score 0 · Answer 11 · edited Feb 12 '23 at 12:47

0

I think the shortest way is:

colnames(df) <- unlist(df[1, ])

edited Feb 12 '23 at 12:47

cigien

57,834
11
73
112

answered Feb 12 '23 at 11:06

Grad Doc

1
1

use first row data as column names in r

11 Answers11

Linked

Related