3

When reading in a data file into R, I can read it in either as a data.frame or a data.table using the data.table package. I would prefer to use data.table in the future since it deals with large data better. However, there are issues with both methods (read.table for data.frames, fread for data.tables) and I'm wondering if there's a simple fix out there.

When I use read.table to produce a data.frame, if my column names include colons or spaces, they are replaced by periods, which I do not want. I want the column names to be read in "as is."

Alternatively, when I use fread to produce a data.table, my column names are not read in at all, which is obviously not desired.

Check out this gist below for a reproducible example:

https://gist.github.com/jeffbruce/b966d41eedc2662bbd4a

Cheers

user26665
  • 325
  • 3
  • 14
  • 2
    Why did you put that in a gist instead of in your question? – nrussell Sep 24 '15 at 17:27
  • I highly recommend you change the names of your columns. you're creating nightmares for yourself by insisting they remain as is. – MichaelChirico Sep 24 '15 at 17:33
  • If what you've got in mind is making your data readable when you create output tables, etc. you can store a separate table to convert between concise column names and ones that are useful to work with. – MichaelChirico Sep 24 '15 at 17:34
  • I don't have a say in what the columns are named. I need to deal with what's been given to me. Also, I used a gist for the data because the data wouldn't fit in the question body. – user26665 Sep 24 '15 at 18:13

2 Answers2

10

Here's a solution that might work. I'm not sure if it's the shortest solution or you can do it by clever use of drop in data table, but the hack below does work. The "problem" is the row numbers in your file.

Read in the header file first and then add it to the data table after

header <- read.table("yourfile.csv", header = TRUE, nrow = 1)
indata <- fread("yourfile.csv", skip=1, header=FALSE)
setnames(indata, colnames(header))
ekstroem
  • 5,957
  • 3
  • 22
  • 48
6

R always try to convert column names to ensure that they are valid variable names, hence it adds periods in place of spaces and colons. If you dont want that you can use check.names=FALSE while using read.table

df1<-read.table("data.txt",check.names = FALSE)

sample(colnames(df1),10)
 [1] "simple lobule white matter"                       
 [2] "anterior lobule white matter"                     
 [3] "hippocampus"                                      
 [4] "lateral olfactory tract"                          
 [5] "lobules 1-2: lingula and central lobule (ventral)"
 [6] "Medial parietal association cortex"               
 [7] "Primary somatosensory cortex: trunk region"       
 [8] "midbrain"                                         
 [9] "Secondary auditory cortex: ventral area"          
[10] "Primary somatosensory cortex: forelimb region"  

you can see that colnames are kept as it is.

Dhawal Kapil
  • 2,584
  • 18
  • 31