38

I have bunch of .csv.bz2 files, which i have to download, extract, and read in R. I downloaded the file and want to extract it to current working directory, then read it. unz(filename,filename.csv) but it does not seem to work. How can I do that?

I heard somewhere that bzfiles can be read directly without decompressing. How can I do that?

Prabhu
  • 5,296
  • 4
  • 37
  • 45

5 Answers5

39

You can use any of these two commands:

  1. read.csv()command: with this command you can directly supply your compressed filename containing csv file.

    read.csv("file.csv.bz2")

  2. read.table() command: This command is generic version of read.csv() command. You can set delimiters and others options that read.csv() automatically sets. You don't need to uncompress the file separately. This command does it automatically for you.

    read.csv("file.csv.bz2", header = TRUE, sep = ",", quote = "\"",...)

Amrit Shrestha
  • 1,620
  • 20
  • 25
27

Like this:

readcsvbz2file <- read.csv(bzfile("file.csv.bz2"))
Komal Rathi
  • 4,164
  • 13
  • 60
  • 98
  • 13
    `bzfile()`is not necessary, `read.csv()` can handle compressed files automatically. So it's just `read.csv("file.csv.bz2")`. [Here is an example](http://rpubs.com/Noseshine/77486) (first section "Loading the Data"). – Mörre Apr 27 '15 at 07:15
  • 1
    bzipfile() is a more general solution because it useful for other formats. Thanks – Charles Santana Mar 20 '20 at 00:57
11

You can make use of the super fast fread which has built-in support for bz2-compressed files

require(data.table)
fread("file.csv.bz2")
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
user2161065
  • 1,826
  • 1
  • 18
  • 18
8

Basically, you need to type:

library(R.utils)
bunzip2("dataset.csv.bz2", "dataset.csv", remove = FALSE, skip = TRUE)

dataset <- read.csv("dataset.csv")

See documentation here: bunzip2 {R.utils}.

DrKaoliN
  • 1,346
  • 4
  • 25
  • 39
4

According to read.table description, one can read a compressed file directly.

read.table("file.csv.bz2")
Miha Trošt
  • 2,002
  • 22
  • 25