R script to format datatable to exactly 2 decimal places

Question

I have made a datatable "Event_Table" with 46 rows and 6 columns. At some point I export this to text file and would like the output of some fields to be truncated to exactly 2 decimal places.

Event_Table[1:34,3:6]=round(Event_Table[1:34,3:6])
Event_Table[36:39,3:6]=format(round(Event_Table[36:39,3:6],2), nsmall=2) 
Event_Table[41:46,3:6]=format(round(Event_Table[41:46,3:6],2), nsmall=2)

Line 1 and 2 produce the desired result, but subsequently running line 3 throws an error:

Error in Math.data.frame(list(CO = c("0", "0", "0.786407766990291", "0",  : 
non-numeric variable in data frame: CONCONATotal

Why? If remove line 2, then line 3 runs fine. So somethign about setting the formatting in one part of the table is affecting the entire table and prevents a second format command form being possible (even though the formatting is only being applied to discrete parts of the table). Any ideas how to avoid this, or to achieve what is required in a different way?

EDIT:

I should perhaps add that the following code is not quite sufficient:

Event_Table[36:46,3:6]=round(Event_Table[36:46,3:6], digits=2)

Trailing zeros are truncated. i.e. A value of 1 is displayed as "1", not as "1.00". The latter being what is required.

EDIT2:

Here is the table:

ChrSize Chr CO  NCO NA  Total
1   230218  1   4.00    1.00    0   5.00
2   813184  2   6.00    6.00    0   12.00
3   316620  3   2.00    3.00    0   5.00
4   1531933 4   13.00   20.00   0   33.00
5   576874  5   3.00    8.00    0   11.00
6   270161  6   4.00    2.00    0   6.00
7   1090940 7   11.00   5.00    0   16.00
8   562643  8   5.00    9.00    0   14.00
9   439888  9   6.00    3.00    0   9.00
10  745751  10  10.00   6.00    0   16.00
11  666816  11  3.00    7.00    0   10.00
12  1078177 12  11.00   13.00   1   25.00
13  924431  13  7.00    12.00   0   19.00
14  784333  14  5.00    6.00    1   12.00
15  1091291 15  6.00    17.00   0   23.00
16  948066  16  7.00    6.00    0   13.00
17  12071326    TOTAL   103.00  124.00  2   229.00
18  NA  Event Lengths:  NA  NA  NA  NA
19  NA  Min Len 0.00    22.00   0   0.00
20  NA  Max Len 14745.00    12524.00    0   14745.00
21  NA  Mean Len    2588.00 1826.00 0   2153.00
22  NA  Median Len  1820.00 1029.00 0   1322.00
23  NA  Chromatids: NA  NA  NA  NA
24  NA  1_chrom 0.00    98.00   2   100.00
25  NA  2_chrom 81.00   22.00   0   103.00
26  NA  3_chrom 14.00   4.00    0   18.00
27  NA  4_chrom 8.00    0.00    0   8.00
28  NA  Classe: NA  NA  NA  NA
29  NA  1_1brin 0.00    55.00   0   55.00
30  NA  1_2brins    0.00    43.00   2   45.00
31  NA  2_nonsis    81.00   15.00   0   96.00
32  NA  2_sis   0.00    7.00    0   7.00
33  NA  classe_3    14.00   4.00    0   18.00
34  NA  classe_4    8.00    0.00    0   8.00
35  NA  Fraction of Chromatids: NA  NA  NA  NA
36  NA  1_chrom 0.00    0.79    1   0.44
37  NA  2_chrom 0.79    0.18    0   0.45
38  NA  3_chrom 0.14    0.03    0   0.08
39  NA  4_chrom 0.08    0.00    0   0.03
40  NA  Fraction of each Classe:    NA  NA  NA  NA
41  NA  1_1brin 0.00    0.44    0   0.24
42  NA  1_2brins    0.00    0.35    1   0.20
43  NA  2_nonsis    0.79    0.12    0   0.42
44  NA  2_sis   0.00    0.06    0   0.03
45  NA  classe_3    0.14    0.03    0   0.08
46  NA  classe_4    0.08    0.00    0   0.03

I require rows 1-34 formatted without decimals. And rows 36-46 formatted with precisely 2 decimal places for all values.

EDIT3: The initial data is read sequentially into tables called "data", then a derivative output table "Event_Table" is generated in which I am inserting summaries of various aspects of each "data" table (i.e. totals, means, medians etc). I then sequentially export the "Event_Tables" since these contain the required summary informations for each "data" table.

Here is the start of the code:

# FIRST SET WORKING DIRECTORY WHERE INPUT FILES ARE!

files = list.files(pattern="Events_") # import files names with "Event_" string into variable "files" 
files1 = length(files) # Count number of files
files2 = read.table(text = files, sep = "_", as.is = TRUE) #Split file names by "_" separator and create table "files2"

for (j in 1:files1)
{data <- read.table(files[j], header=TRUE) #Import datatable from files number 1 to j

# Making derivative dataframes:
Event_Table <- data.frame(matrix(NA, nrow = 46, ncol = 6)) # Creates dataframe of arbitrary size full of NAs
names(Event_Table) <- c("ChrSize","Chr","CO","NCO","NA","Total") # Adds column names to dataframe
Event_Table ["Chr"] = c(1:16, "TOTAL","Event Lengths:","Min Len", "Max Len","Mean Len","Median Len","Chromatids:","1_chrom","2_chrom","3_chrom","4_chrom","Classe:","1_1brin","1_2brins","2_nonsis","2_sis","classe_3","classe_4","Fraction of Chromatids:","1_chrom","2_chrom","3_chrom","4_chrom","Fraction of each Classe:","1_1brin","1_2brins","2_nonsis","2_sis","classe_3","classe_4") #    Inserts vector 1:16 (numbers 1 to 16) in column 1 of dataframe
Event_Table [1:16,"ChrSize"] = c(230218,813184,316620,1531933,576874,270161,1090940,562643,439888,745751,666816,1078177,924431,784333,1091291,948066)
Event_Table [17,"ChrSize"] =sum(Event_Table [1:16,"ChrSize"])

nE = nrow(data) # Total number of events
Event_Table [17,"Total"] = nrow(data)
Event_Table [19,"Total"] = min(data ["len"])
Event_Table [20,"Total"] = max(data ["len"])
Event_Table [21,"Total"] = mean(data ["len"])
Event_Table [22,"Total"] = median(data [1:nrow(data),"len"])

#More stuff here, etc, then close j loop }

So the Event_Table is set up as a data.frame of type matrix filled with NAs. I then fill it manually with relevant info in relevant grid positions. I then simply want to format the visual appearance of these fields.

If I am going about this all wrong, then please can you suggest a better way to do this! Thanks

Do `str(Event_Table)` and make sure it is really structured the way you are thinking. The error message mentions a column name that your code must be accessing or it shouldn't throw an error. — Bryan Hanson, May 01 '15 at 08:22
Maybe this [question](https://stackoverflow.com/questions/11228403/setting-default-number-of-decimal-places-for-printing) about printing will help you, but you haven't said how you are exporting. — Bryan Hanson, May 01 '15 at 08:25
I am exporting to a text file using "writeLines" and "write.table" commands. This works fine. The error arises when I try to run the second "format" command. — Matt, May 06 '15 at 09:03
Running:> str(Event_Table)'data.frame': 46 obs. of 6 variables: $ ChrSize: num 230218 813184 316620 1531933 576874 ... $ Chr : chr "1" "2" "3" "4" ... $ CO : chr "4" "6" "2" "13" ... $ NCO : chr "1" "6" "3" "20" ... $ NA : chr "0" "0" "0" "0" ... $ Total : chr "5" "12" "5" "33" ... — Matt, May 06 '15 at 09:06
Note: I'm using R-studio and I can see that the data are arranged in the correct manner. — Matt, May 06 '15 at 09:09
Just to clarify, you have a `data frame` not a `data table`, they are actually different things. I'm not sure how you created this data frame, because data frames cannot have different data types in a particular column. At row 17 and then at row 18 the nature of the data changes signficantly, and that is what is causing your error. I would go back and read the data in differently. I don't know how you did that, so all I can suggest is that you look at the arguments of whatever function you used. — Bryan Hanson, May 06 '15 at 11:02
You could use `read.table` and arguments `skip` and `nrows` to read in the first 16 rows, and in a separate call, rows 18 - end. Those sections have a consistent structure. Then you will be ready (and it will be possible) to deal with your original formatting question. — Bryan Hanson, May 06 '15 at 11:07
Also, notice how the `str` command shows that the columns that appear numerical are actually character, again a side effect of the varied nature of the data. It's going to be hard to format character data numerically. — Bryan Hanson, May 06 '15 at 11:43
Hi Bryan. This is not an original source of the data. I've added some clarification to the post, to see if that helps find out where I am going wrong. i only started working with R two weeks ago. — Matt, May 07 '15 at 11:17
First, as a new `R` user your coding is off to a good start, esp. the documenting as you go. Important habit, keep it up! Assuming your goal is to write this info as a report in a file, you are creating 3 pieces each of which needs different formatting. I suggest that you keep them as separate pieces, format them as you desire, then 'print' them sequentially so they appear as a whole. To do that, you could use `sink()` to open a file, then write to it 3x using `print` or possibly `cat`. Or, not tested, use `write.table` and then `append` to it with 2x writes. That might be your best bet. — Bryan Hanson, May 07 '15 at 11:49
Hi Bryan, I've also come to the conclusion that separating out these (and potentially subsequent output tables) is the best solution. I already currently write out and append the output to a text file that first has a header written in it, so that should work. I am still interested to understand why the sequential use of the format command throws an error. Also, many thanks for the kind words, I was brought up in the 80s coding Sinclair BASIC, so I'm familiar with using variables, loops and arrays, but not quite so used to vectors and the ability to directly execute arithmetic on them! — Matt, May 07 '15 at 15:16

score 0 · Answer 1 · edited May 23 '17 at 11:51

0

It could be a similar problem as Error in Math.data.frame.....non-numeric variable in data frame:. Maybe you have commas in your data. If that is not the case, could you show what is in your table?

edited May 23 '17 at 11:51

Community

1
1

answered May 01 '15 at 08:17

Freeworld

1
1
2

It definitely seems to be the same error - presumably caused by the contents of the dataframe (parts of the dataframe contain text and some NAs, but all of the regions I am formatting contain only numbers). Moreover, what I don't understand is why either line when run in isolation works fine with no error. The problem arises when I try to run a second "format" command on a second part of the table (which has not yet been formatted). – Matt May 06 '15 at 08:49
You have many NA's in your table, you can omit these by using `na.rm=TRUE`. See also http://statmethods.net/input/missingdata.html. – Freeworld May 06 '15 at 11:45
The NAs are expected, and I want them to be there. This is an output table arranged in a manner suitable for visualisation. It is not the raw data. Hence wanting there to be some blank (NA) spaces in it. And hence me wanting to format the decimals appropriately. – Matt May 06 '15 at 12:47

score 0 · Accepted Answer · answered May 07 '15 at 12:04

Here is a proof of concept using 2 rather different data frames:

DF1 <- data.frame(x = rnorm(10), person = rep(LETTERS[1:2], 5))
DF2 <- data.frame(y = 1:10L, result = rep(LETTERS[3:4], 5), alt = rep(letters[3:4], 5))
write.table(DF1, file = "example.csv", sep = ",")
write.table(DF2, file = "example.csv", sep = ",", append = TRUE)

This issues a warning (about column names - no problem) and gives:

x   person      
1   0.796933543 A   
2   1.495800567 B   
3   0.359153458 A   
4   2.105378598 B   
5   0.175455314 A   
6   -1.850171347    B   
7   -0.87197177 A   
8   2.682650638 B   
9   1.040676847 A   
10  -0.086197042    B   
y   result  alt 
1   1   C   c
2   2   D   d
3   3   C   c
4   4   D   d
5   5   C   c
6   6   D   d
7   7   C   c
8   8   D   d
9   9   C   c
10  10  D   d

From here you can control the formatting as desired. You may wish to suppress the column names or give more informative ones, and you probably don't want the row numbering either. See ?write.table for all the options.

R script to format datatable to exactly 2 decimal places

2 Answers2