0

I have uploaded a data set which is called as "Obtained Dataset", it usually has 16 rows of numeric and character variables, some other files of similar nature have less than 16 characters, each variable is the header of the data which starts from the 17th row and onwards "in this specific file".

Obtained dataset & Required Dataset

enter image description here

For the data that starts 1st column is the x-axis, 2nd column is y-axis and 3rd column is depth (which are standard for all the files in the database) 4th column is GR 1 LIN, 5th column is CAL 1 LIN so and soforth as given in the first 16 rows of the data.

Now i want an R code which can convert it into the format shown in the required data set, also if a different data set has say less than 16 lines of names say GR 1 LIN and RHOB 1 LIN are missing then i want it to still create a column with NA entries till 1:nrow.

Currently i have managed to export this file to excel and manually clean the data and rename the columns correspondingly and then save it as csv and then read.csv("filename") etc but it is simply not possible to do this for 400 files.

Any advice how to proceed will be of great help.

tonytonov
  • 25,060
  • 16
  • 82
  • 98
  • 3
    I suggest you narrow down your question to a specific problem that is preventing you from advancing in writing your code. I don't think we can work with an image of a data set. Writing code for such a think would tedious, something that should be done only against compensation. In short, provide a [small reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with the code that you've got so far. – Roman Luštrik Aug 08 '14 at 07:12

1 Answers1

1

I have noticed that you have probably posted this question again, and in a different format. This is a public forum, and people are happy to help. However, it's your job to simplify life of others, and you are requested to put in some effort. Here is some advice on that.

Having said that, here is some code I have written to help you out.

Step0: Creating your first data set:

sink("test.txt")  # This will `sink` all the output to the file "test.txt"

# Lets start with some dummy data
cat("1\n")
cat("DOO\n")
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))

# Now a 10 x 16 dummy data matrix:
cat(paste(apply(matrix(sample(160),10),1,paste,collapse = "\t"),collapse = "\n"))
cat("\n")

sink()            # This will stop `sink`ing.

I have created some dummy data in first 6 lines, and followed by a 10 x 16 data matrix.

Note: In principle you should have provided something like this, or a copy of your dataset. This would help other people help you.

Step1: Now we need to read the file, and we want to skip the first 6 rows with undesired info:

(temp <- read.table(file="test.txt", sep ="\t", skip = 6))

Step2: Data clean up: We need a vector with names of the 16 columns in our data:

namesVec <- letters[1:16]

Now we assign these names to our data.frame:

names(temp) <- namesVec
temp

Looks good!

Step3: Save the data:

write.table(temp,file="test-clean.txt",row.names = FALSE,sep = "\t",quote = FALSE)

Check if the solution is working. If it is working, than move to next step, otherwise make necessary changes.

Step4: Automating:

First we need to create a list of all the 400 files. The easiest way (to explain also) is copy the 400 files in a directory, and then set that as working directory (using setwd).

Now first we'll create a vector with all file names:

fileNameList <- dir()

Once this is done, we'll need to function to repeat step 1 through 3:

convertFiles <- function(fileName) {
  temp <- read.table(file=fileName, sep ="\t", skip = 6)
  names(temp) <- namesVec  
  write.table(temp,file=paste("clean","test.txt",sep="-"),row.names = FALSE,sep = "\t",quote = FALSE)
}

Now we simply need to apply this function on all the files we have:

sapply(fileNameList,convertFiles)

Hope this helps!

Community
  • 1
  • 1
Shambho
  • 3,250
  • 1
  • 24
  • 37