I have a large database of more than 85,000 values for more than 100 different companies tagged along more than 100 variables. My goal is to identify the descriptive statistics (mean, standard deviation, min max, and the number of values) corresponding to several variables.
The following is a set of information on one given company which I'll call Company F.
Attendance Number of representatives Number of Presenters Company Audience
29 2 30 2
20 3 30 4
30 10 20 5
40 20 10 5
10 30 13 5
What I am looking to do is to have R calculate the descriptive statistics [mean, standard deviation, min, and max] and for each one of these specific columns and export it into Excel in the following manner:
Company F Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience
Because this is such a long row, I'll summarize it by saying that I am trying to find the descriptive statistics [mean, standard deviation, min, max, and n] of each one of these columns. These should all correspond to company F.
How I have tried to solve this problem:
I have used the descriptive statistics feature in R to get the dataframe to identify the code for me. To do this I used the psych package:
library(psych)
describe(CompanyF$Attendance)
describe(CompanyF$NumberofRepresentatives)
describe(CompanyF$Number_of_Presenters
describe(CompanyF$Company Audience)
From using the package I was able to get dataframes and then go into Excel and construct the rows by hand, inputting the values that I received and omitting any information given by the psych library package that didn't correspond to what I was interested in. The following is an example of the type of information I got from the psych package:
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 559 2.02 2.21 1 1.75 1.48 0 9 9 0.78 -0.65 0.09
This process is very time consuming and is open for errors. Upon finishing the work for Company F, I then create a new row in Excel just underneath the one for Company F but this time for another company such as Company G whereupon I continue the process of finding the descriptive statistics [mean, standard deviation, min, max, and n] for each one of these variables of interest (attendance, number of representatives, number of presenters, and company audience).
I have looked for various solutions one of which came from this stack overflow post Export data from R into Excel but I was not able to find an explanation of how to import from R row by row information into Excel as well as how to specify that it identify descriptive statistics that I have listed above.
Ideally, I would have the following output placed into Excel:
Company F Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience
Company G Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience
Company H Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience
and so on.
A raw subset of my data is below:
structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L,
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA,
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L,
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L,
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L,
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L,
15L), .Label = c("Broome Street", "Company A", "Company B", "Company BC",
"Company C", "Company CC", "Company D Clinton", "Company DD",
"Company E", "Company ED BroadCompany", "Company G", "Company H
BroadCompany",
"Company I BroadCompany", "Company I Studio", "Company J", "Company K",
"Company L", "Company M", "Company M BroadCompany", "Company M HS
BroadCompany",
"Company MCC BroadCompany", "Company N", "Company P", "Company Q",
"Company Q Company N", "Company Q Company ZZ", "Company R - Company ZZ",
"Company SLab", "Company Z", "Company ZE", "Company ZED", "Company ZEQ",
"Company ZZ", "Company ZZQ", "Company ZZQ Company N"), class = "factor"),
earn_tot = c(21.85, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, NA, NA, 25.82,
40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA,
30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 8.25,
NA, NA), earn_and_current_tot = c(29.43, 20.8, NA, 8.16,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 49.9, NA, 37.56, NA, NA, 41.98,
NA, NA, NA, NA, NA, 37.32, 49, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0, NA, NA, NA, 37, NA, NA, NA, NA, NA, NA, 47.68,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57.29, 48.48, NA,
7, 45.9, 0, NA, NA, 15.75, NA, NA), pass_99 = c(0L, 0L, NA,
NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, NA,
NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L,
NA, NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, 1L, NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L,
NA, NA), pass_65 = c(0L, 0L, 5L, 0L, 6L, NA, 0L, 5L, NA,
5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, NA, NA,
NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, NA,
NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA,
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L,
0L, 5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L)), .Names = c("sn",
"earn_tot", "earn_and_current_tot", "pass_99", "pass_65"), row.names = c(NA,
80L), class = "data.frame")
There are four subset columns that are of most importance. These columns are "earn_tot", "earn_and_current_tot", "pass_99", and "pass_65". There are numerous companies listed here which have been anonymized. There are roughly 100 companies I am working with. There are numerous company names under the column entitled "sn". The name of the entire subsetted dataset is called Subset.MergedEx.So.
I apologize for not putting up a good reproducible example. Thank you for your patience. I've been reading how to construct one and have used the following code: dput((head(Subset.MergedEx.SO, 80)))