0

I have a data.table below in which I want to add a new column called outcome showing the skewness in each row.

e.g. if the row has values (336, 0, 0, 0), then this particular row is highly skewed to the left and the outcome should be -1. If the row has values (0, 0, 0, 336), then the row is highly skewed to the right and the outcome should be +1. For all the other cases outcome should be between -1 and +1. For example, if the row has values (200, 100, 20, 16), then the outcome should be around -0.8, indicating that this row is skewed to the left.

temp_dt = structure(list(date = structure(c(19425L, 19426L, 19429L, 19430L, 
                                  19431L, 19432L, 19433L, 19436L, 19437L, 19438L, 19439L, 19440L
), class = c("IDate", "Date")), 
`-4` = c(336L, 0L, 310L, 315L, 
         321L, 284L, 230L, 117L, 43L, 35L, 36L, 31L), 
`-2` = c(0L, 0L, 
         21L, 21L, 18L, 47L, 86L, 171L, 105L, 192L, 224L, 208L), 
`2` = c(0L, 
        0L, 0L, 0L, 0L, 0L, 3L, 26L, 149L, 88L, 67L, 84L), 
`4` = c(0L, 
        336L, 5L, 2L, 0L, 6L, 20L, 21L, 39L, 20L, 10L, 9L)), 
row.names = c(NA, 
              -12L), class = c("data.table", "data.frame"), sorted = "date")
> temp_dt 
          date  -4  -2   2   4
 1: 2023-03-09 336   0   0   0
 2: 2023-03-10   0   0   0 336
 3: 2023-03-13 310  21   0   5
 4: 2023-03-14 315  21   0   2
 5: 2023-03-15 321  18   0   0
 6: 2023-03-16 284  47   0   6
 7: 2023-03-17 230  86   3  20
 8: 2023-03-20 117 171  26  21
 9: 2023-03-21  43 105 149  39
10: 2023-03-22  35 192  88  20
11: 2023-03-23  36 224  67  10
12: 2023-03-24  31 208  84   9

I have used chi.square test to solve this problem but the results are not as I expect them to be

temp_dt[, outcome := chisq.test(.SD), by = .(date)]
> temp_dt
          date  -4  -2   2   4    outcome
 1: 2023-03-09 336   0   0   0 1008.00000 # Both left skewed and right skewed rows have same outcome, which is not desired result
 2: 2023-03-10   0   0   0 336 1008.00000 # Both left skewed and right skewed rows have same outcome, which is not desired result
 3: 2023-03-13 310  21   0   5  813.59524
 4: 2023-03-14 315  21   0   2  841.52663
 5: 2023-03-15 321  18   0   0  880.64602
 6: 2023-03-16 284  47   0   6  646.98813
 7: 2023-03-17 230  86   3  20  377.28319
 8: 2023-03-20 117 171  26  21  190.93433
 9: 2023-03-21  43 105 149  39   99.66667
10: 2023-03-22  35 192  88  20  217.03582
11: 2023-03-23  36 224  67  10  328.41246
12: 2023-03-24  31 208  84   9  286.81928

Is there any other way to solve this problem and see the skewness of each row? I will appreciate a data.table solution.

The solution suggested by TarJae and Pax

temp_dt[, `-4` := `-4` * -1]
temp_dt[, `-2` := `-2` * -1]
temp_dt[, outcome := apply(.SD, 1, moments::skewness), .SDcols = c("-4", "-2", "2", "4")]

> temp_dt
          date   -4   -2   2   4    outcome
 1: 2023-03-09 -336    0   0   0 -1.1547005
 2: 2023-03-10    0    0   0 336  1.1547005
 3: 2023-03-13 -310  -21   0   5 -1.1361798
 4: 2023-03-14 -315  -21   0   2 -1.1393005
 5: 2023-03-15 -321  -18   0   0 -1.1448131
 6: 2023-03-16 -284  -47   0   6 -1.0565676
 7: 2023-03-17 -230  -86   3  20 -0.6687309
 8: 2023-03-20 -117 -171  26  21 -0.1431630
 9: 2023-03-21  -43 -105 149  39  0.3021199
10: 2023-03-22  -35 -192  88  20 -0.5703578
11: 2023-03-23  -36 -224  67  10 -0.7789411
12: 2023-03-24  -31 -208  84   9 -0.6480315
Saurabh
  • 1,566
  • 10
  • 23
  • Can [this SO post](https://stackoverflow.com/questions/38034776/hypothesis-testing-skewness-and-or-kurtosis-in-r) help? Or maybe even better, [this one](https://stackoverflow.com/a/48029065/8245406)? – Rui Barradas Mar 27 '23 at 19:24
  • I tried something like ```test_dt[, outcome := skewness(.SD)[1], by = .(date)]``` but I am getting all NA values in outcome column. – Saurabh Mar 27 '23 at 19:28
  • What about multiplying colums 2 and 3 by `-1` and applying a skewness check afterwards? – Pax Mar 27 '23 at 19:46
  • Pax, I tried your suggestion and getting the error - ```Error in chisq.test(.SD) : all entries of 'x' must be nonnegative and finite``` – Saurabh Mar 27 '23 at 19:50
  • Try this: `test_dt[, outcome := apply(.SD, 1, moments::skewness), .SDcols = c("-4", "-2", "2", "4")]` – TarJae Mar 27 '23 at 19:50
  • Of course, does not work for chi-square test, use a function checking for skewness. – Pax Mar 27 '23 at 19:51
  • TarJae, Mixing your code with the suggestion by ```Pax``` gave a better result. I have shown the result in question above. – Saurabh Mar 27 '23 at 19:53
  • TarJae, please add your suggestion as answer. – Saurabh Mar 27 '23 at 20:01

0 Answers0