1

I am using R.3.3.1. in RStudio 0.99.903 on a work PC.

I have a tab-separated file that i'm trying to read in with fread. Unfortunately some of the rows end with double-tab while others don't.

Here's the first few lines of my data:

[1] "1054434\t01-01-2015\t-1\tAMOUNT OWN MUSIC\t12\t\t"               
[2] "1054434\t01-01-2015\t-1\tDVDS\t2\t"                            
[3] "1054434\t01-01-2015\t-1\tINIT TV\t2\t\t"                         
[4] "1054434\t01-01-2015\t-1\tINIT2\t4\t\t"                           
[5] "1054434\t01-01-2015\t-1\tIntro_other_TV\t2\t\t"    

I thought i could get around this problem using the option fill=TRUE but i get this error message:

test<-fread(filenames[1], header = FALSE, fill = TRUE) 
Error in fread(filenames[1], header = FALSE, fill = TRUE) : 
unused argument (fill = TRUE)

I don't understand why fill doesn't work as it's definitely a valid option according to the help file...

I am using data.table 1.9.6. from CRAN as i get this error message when i try to install the github version:

* installing *source* package 'data.table' ...
** libs

*** arch - i386
Warning: running command 'make -f "Makevars" -f "D:/R- 33~1.1/etc/i386/Makeconf" -f "D:/R-33~1.1/share/make/winshlib.mk" SHLIB="data.table.dll" OBJECTS="assign.o bmerge.o chmatch.o dogroups.o fastmean.o fcast.o fmelt.o forder.o frank.o fread.o fsort.o fwrite.o gsumm.o ijoin.o init.o openmp-utils.o quickselect.o rbindlist.o reorder.o shift.o subset.o transpose.o uniqlist.o vecseq.o wrappers.o"' had status 127
ERROR: compilation failed for package 'data.table'
* removing 'D:/R-3.3.1/library/data.table'
Warning in install.packages :
running command '"D:/R-33~1.1/bin/x64/R" CMD INSTALL -l "D:\R-3.3.1\library"    C:\Users\swiftc47\AppData\Local\Temp\RtmpeYBevK/downloaded_packages/data.table_1.9.7.tar.gz' had status 1
Warning in install.packages :
installation of package ‘data.table’ had non-zero exit status
chrisjacques
  • 635
  • 1
  • 5
  • 17

2 Answers2

1

There is no fill option for 1.9.6 -- try updating to the current CRAN version (1.9.8+) where fill = TRUE works fine:

fread("test.tsv", fill = TRUE)
#         V1         V2 V3               V4 V5 V6 V7
# 1: 1054434 01-01-2015 -1 AMOUNT OWN MUSIC 12 NA NA
# 2: 1054434 01-01-2015 -1             DVDS  2 NA NA
# 3: 1054434 01-01-2015 -1          INIT TV  2 NA NA
# 4: 1054434 01-01-2015 -1            INIT2  4 NA NA
# 5: 1054434 01-01-2015 -1   Intro_other_TV  2 NA NA

where test.tsv is your file.

Barring that, you can use command line tools to trim the trailing whitespace; I'm not facile with sed, so I'm using this question as reference:

fread("sed 's/[ \t]*$//' test.tsv")
#         V1         V2 V3               V4 V5
# 1: 1054434 01-01-2015 -1 AMOUNT OWN MUSIC 12
# 2: 1054434 01-01-2015 -1             DVDS  2
# 3: 1054434 01-01-2015 -1          INIT TV  2
# 4: 1054434 01-01-2015 -1            INIT2  4
# 5: 1054434 01-01-2015 -1   Intro_other_TV  2

A final option is to replace the double \t with a single one, in case you wanted a column of NA:

fread("sed 's/[ \t][ \t]$/\t/' ~/Desktop/test.tsv")
#         V1         V2 V3               V4 V5 V6
# 1: 1054434 01-01-2015 -1 AMOUNT OWN MUSIC 12 NA
# 2: 1054434 01-01-2015 -1             DVDS  2 NA
# 3: 1054434 01-01-2015 -1          INIT TV  2 NA
# 4: 1054434 01-01-2015 -1            INIT2  4 NA
# 5: 1054434 01-01-2015 -1   Intro_other_TV  2 NA
Community
  • 1
  • 1
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • thanks Michael, the sed thing is a really interesting feature. Unfortunately i wasn't able to get it to work on my computer... I eventually managed to install data.table 1.9.7 from GitHub but that involved reverting to R.3.2.2. and installing Rtools32. fill=TRUE works perfectly when the extra columns are at the start of the file but not when they start appearing in lines further along the file... – chrisjacques Aug 15 '16 at 10:29
  • @chrisjacques the above works on your example; you'll have to isolate the difference between your current example and actual file in order for me to help further – MichaelChirico Aug 15 '16 at 12:38
-2

you can do this with

read.table(filenames[1],fill=TRUE)
myk_raniu
  • 150
  • 1
  • 1
  • 7