I want to use the function skim
from R package skimr
to produce summary statistics of multiple datasets. To save space, I need to prioritize information that gets displayed. I would like to remove these rows from the Data Summary section of skim
output: "Name", "Column type frequency", and "Group variables". Is there an easy way to do this?
I tried skim(iris) and got the following:
-- Data Summary ------------------------
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Instead, I want to display the following:
-- Data Summary ------------------------
Values
Number of rows 150
Number of columns 5
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃