Software: R Studio
Version: 0.98.1102
Operating System: Windows 7 Professional
Issue #1: I have a .txt file that is 100MB+. It has 4 variables and over 500,000 observations for each variable.
Issue #2: Assuming column1 was a column with dates that were factors. Is it possible to change the class of only column1 to class of date using the colClasses argument of read.csv()?
If I read the file via:
mydata <- read.csv("myfile", sep = ";", na.strings = "?", stringsAsFactors = FALSE)
Issue #1
The file loads indefinitely on my computer due to the size of the file.
The file has the format
column1 column2 column3
dog bird apple
cat dove orange
rat sparrow kiwi
may bird apple
cat dove orange
rat sparrow kiwi
I'm trying to figure out how to do the following:
1. Read only the rows of from the data set where column 1 has "dog"
2. Read only the rows of the data set where column 1 has dog and column2 has bird
Things I have been trying so far 1. I read that I can load the entire data and then subset it but I really would like to avoid that. The reason is that the file is too large to load initially. I would like instead, to just load only specific data based on criteria
Issue #2
Assuming column1 was in the form of 05/01/2015 but had the class of "factor". Is it possible to change the class of only column 1 to class "date" using the colClasses argument of read.csv? Perhaps something like this?
mydata <- read.csv("myfile", sep = ";", na.strings = "?",
stringsAsFactors = FALSE, colClasses = c(column1 =as.date(column1))
Or perhaps something like this
mydata <- read.csv("myfile", sep = ";", na.strings = "?",
stringsAsFactors = FALSE, colClasses = c(column1 =strptime(column1 %MM%DD%YY))