I am trying to edit the column headings and remove some variables (columns) from a large .csv file. The file is just under 4GB and R isn't able to open it because the computer runs out of memory.
I have got the code to clean the data:
#label all of the columns;
install.packages("plyr")
library(plyr)
house_all_name <- rename(house_all, c(V1="ID",V2="Price Paid", V3="Date Sold", V4="Post Code",
V5= "Property Type", V6= "New Build?", V7= "Tenure", V8= "House Name/Number (PAON)",
V9= "SAON", V10= "Street", V11= "Locality", V12= "Town/City", V13= "District",
V14="County", V15="PPD Category Type", V16="Record Status"))
#remove the non-useful variables
house_clean <- house_all_name[,c(-1,-8:-16)]
str(house_clean)
I tried to use the following code to read the file but my computer just started being really slow, running out of memory.
house_all <- read.table("pp-complete.csv", header=FALSE, sep= ',', fill = TRUE)
Therefore, to do this I had to 'practice' on the first 5 rows:
house_all <- read.table("pp-complete.csv", header=FALSE, sep= ',', fill = TRUE, nrows = 5)
From my research I believe it is possible to read it line by line but I don't know how!
Regards, Tommy
p.s. The data file can be found at http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv