-3

I have text file with values with one or two or some with 3 decimal points.These values are generated by the software based on the signal intensity of genes.When I tried to compute the distance matrix out of it,I got the warning message:

Warning message: In dist(sam) : NAs introduced by coercion A sample text file is given below: sample1
a 23.45.12
b 123.345.234
c 45.2311.34

I need to convert these values either with one decimal point or as real numbers so that i can compute distance matrix out of it from which i can use it for clustering.My expected result is given as follows:

  sample1                

a 23.45
b 123.345
c 45.2311

Pleaso do help me

Dinesh
  • 643
  • 5
  • 16
  • 31
  • 2
    Do you mean that the "." is the value separating the three values? If not, what number or numbers does "23.45.12" represent? – Aaron left Stack Overflow Feb 28 '12 at 12:19
  • @Aaron this values are generated from a machine and it represents a single value – Dinesh Feb 28 '12 at 12:22
  • 3
    But there are two decimal points in it. – Aaron left Stack Overflow Feb 28 '12 at 12:23
  • 1
    @Thileepan Numbers with two decimal places aren't numbers, at least in the Western world and, more importantly, as far as R is concerned. So, what do your "two-decimal" values refer to in the world of real numbers? – Gavin Simpson Feb 28 '12 at 12:29
  • @Aaron and @ Gavin Simpson Actually these values are expression values of genes generated from the software which converts the signals into numerical values. – Dinesh Feb 28 '12 at 12:35
  • 1
    @Thileepan R doesn't care if those numbers mean anything to you, it is expecting numbers with a single decimal place otherwise these are not numbers according to R. Now either you tell us how to convert these numeric values into a decimal number or I'm going to vote to close this Q as you are being obtuse and not listening to what we are telling you, and without conversion, there is **no** way to Answer your question. – Gavin Simpson Feb 28 '12 at 12:40
  • 2
    If you just want to drop the last decimal and anything after it, try `strsplt()` on `.` and stick back together the first two parts. `splt <- strsplit(vector, "\\.")`, where `vector` is your column of strings. Then do `sapply(splt, function(x) as.numeric(paste(x[1], ".", x[2], sep = "")))`. E.g. `splt <- strsplit("23.45.12", "\\.")`, then `sapply(splt, function(x) as.numeric(paste(x[1], ".", x[2], sep = "")))` gives `[1] 23.45` which is numeric. No rounding though, just truncation. – Gavin Simpson Feb 28 '12 at 12:43
  • @Gavin Simpson..I changed my question as per your suggestions.Thank you for your help – Dinesh Feb 28 '12 at 12:48
  • Question doesn't make sense. 3 decimal points whereas in example there are only two, no code that has been tried, no reproducible example and no reasonable code that shows us how to round a number with two/three decimal points to a number with only one decimal point (That would be worth a Field Medal btw). – Joris Meys Feb 28 '12 at 12:48
  • @Thileepan Wait, now you want us to convert some numeric code you didn't explain to a random numerical value? Explain what the numbers mean. 'They're generated by a machine' doesn't cut it. `rnorm(100)` is also generated by a machine. – Joris Meys Feb 28 '12 at 12:51
  • @Thileepan Your changed question doesn't address the problem. Unless you tell us how to map from "23.45.12" to a real number, then we can't help. I have already shown you in the comments how to drop the last decimal. Is that what you wanted? If not, people are going to vote to close as your question is meaning less unless you tell us how the values map onto the real numbers. – Gavin Simpson Feb 28 '12 at 12:57
  • @GavinSimpson I have changed my entire question.I need these values either with one.decimal point or as real numbers.Is this fine? – Dinesh Feb 28 '12 at 13:25
  • Voting to close. @Thileepan The only way you can rescue the situation is to provide sample data and expected results. You have provided sample data. What is the expected result? – Andrie Feb 28 '12 at 13:31
  • @Andrie I have also given my expected results. – Dinesh Feb 28 '12 at 13:37
  • OK, so you want to take the first period as the decimal point, and ignore everything after the second period? – Andrie Feb 28 '12 at 13:59
  • 1
    @Andrie Yes.Please do help me – Dinesh Feb 28 '12 at 14:03
  • OK. The question finally makes sense. Answer posted. – Andrie Feb 28 '12 at 14:14

1 Answers1

2

You can do this in one line of code with as.numeric and gsub with a suitable regular expression:

sample1 <- c(
  a = "23.45.12",
  b = "123.345.234",
  c = "45.2311.34"
)

as.numeric(
  gsub("(\\d+\\.\\d+)\\..*", "\\1", sample1)
)

[1]  23.4500 123.3450  45.2311

The regular expression:

  • \\d* finds one or more digits
  • \\. finds a period
  • Thus (\\d+\\.\\d+) finds two sets of digits with a period inbetween, and then groups it (with the brackets)
  • Finally, \\..* finds a period followed by a complete wildcard

Then gsub replaces the entire string with only what was found inside the brackets. This is called a regular expression back reference, indicated by \\1.

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • I tried get distance matrix using dist function but the matrix is filled only with NAs. If you like i can mail you the text file for which I want to create a distance matrix – Dinesh Feb 28 '12 at 15:09
  • 2
    @Thileepan I suggest you read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for tips on how to provide all of the relevant data when posting a question. If you are still stuck, then its possibly because you haven't provided us with the correct information. But, by all means, email me. You'll find that I charge very sensible commercial rates for support. – Andrie Feb 28 '12 at 15:47