0

After the processing of the data, I have a data set something like this:

enter image description here

And you can find the data set here: http://www.sharecsv.com/s/361731b8a7236a8ae147584e44518872/dfk.csv

In case if you can not access the data, here is a sample:

dfk <- tribble(
  ~group,           ~date,       ~id,         ~share,       ~status,
'FALSE',        'January 20',   '25092',   '0,050904271%',   'A',     
'NA',           'April 19',     '29374',   '60,584652862%',  'B',     
'TRUE',         'January 20',   '22625',   '87,401561145%',  'B',     
'TRUE',         'January 20',   '315618',  '99,697519661%', 'A',     
'FALSE & TRUE', 'January 20',   '31002',   '100%         ',  'C',     
'TRUE',         'April 19',     '21788',   '99,836975729%',  'A',     
'TRUE',         'January 20',   '1362',    '76,418519990%',  'A',     
'NA',           'January 20',   '29374',   '59,873882219%',  'C',     
'TRUE',         'April 19',     '9961',    '45,869691681%',  'A',     
'FALSE',        'January 20',   '62184',   '33,833598912%',  'A', 
)

In the data set, there are only two dates: April 19 and January 20. In addition to this, I have three dimensions that are important for my analysis: group, id, and status. By considering all these variables, I would like to see how the share changed from April 19 to January 20.

I've already shown my desired data in the below image:

enter image description here

I would be happy if you could help me to do this. Thanks in advance.

datazang
  • 989
  • 1
  • 7
  • 20
  • Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557/3358272 (and https://xkcd.com/2116/). Please just include the code or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. – r2evans Jan 29 '20 at 21:47
  • Additionally, when links go stale, the question becomes unreproducible. Can you provide sample data in the question itself? This would involve using something like `dput(head(x,10))` or programmatically with `data.frame(...)`, and your expected output from that sample. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Jan 29 '20 at 21:48
  • You're right, sorry. I've just added the sample data as a tibble. But since it is quite hard to produce the expected output in R, I had to attach the image from Excel. @r2evans – datazang Jan 29 '20 at 22:03
  • 1
    In the original data there is an A and FALSE on the same row but in the output there is no column for A and FALSE. – G. Grothendieck Jan 29 '20 at 22:10
  • Right, good point. I've basically sum FALSE and TRUE in the output. But that is totally fine to have the column for A and FALSE. @G.Grothendieck – datazang Jan 29 '20 at 22:14

1 Answers1

2

1) Convert share to numeric and create a table xt from it. Then display the table using ftable. Now you can remove or aggregate any columns you wish or perhaps just leave it as it is or look at (2) below. No packages are used.

dfk$share <- as.numeric(chartr(",%", ". ", dfk$share))
xt <- xtabs(share ~., dfk)
ftable(xt, row.vars = "id", col.vars = c("status", "group"))

giving:

       status            A                                                   B                                                   C                                       
       group         FALSE FALSE & TRUE           NA         TRUE        FALSE FALSE & TRUE           NA         TRUE        FALSE FALSE & TRUE           NA         TRUE
id                                                                                                                                                                       
1362            0.00000000   0.00000000   0.00000000  76.41851999   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000
21788           0.00000000   0.00000000   0.00000000  99.83697573   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000
22625           0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000  87.40156115   0.00000000   0.00000000   0.00000000   0.00000000
25092           0.05090427   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000
29374           0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000  60.58465286   0.00000000   0.00000000   0.00000000  59.87388222   0.00000000
31002           0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000 100.00000000   0.00000000   0.00000000
315618          0.00000000   0.00000000   0.00000000  99.69751966   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000
62184          33.83359891   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000
9961            0.00000000   0.00000000   0.00000000  45.86969168   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000   0.00000000

2) Assuming share has been converted to numeric as above, an alternative is data.table:

library(data.table)
dcast(data = dfk, id ~ status + group, value.var = "share", fill = 0)

giving:

      id     A_FALSE   A_TRUE     B_NA   B_TRUE C_FALSE & TRUE     C_NA
1   1362  0.00000000 76.41852  0.00000  0.00000              0  0.00000
2  21788  0.00000000 99.83698  0.00000  0.00000              0  0.00000
3  22625  0.00000000  0.00000  0.00000 87.40156              0  0.00000
4  25092  0.05090427  0.00000  0.00000  0.00000              0  0.00000
5  29374  0.00000000  0.00000 60.58465  0.00000              0 59.87388
6  31002  0.00000000  0.00000  0.00000  0.00000            100  0.00000
7 315618  0.00000000 99.69752  0.00000  0.00000              0  0.00000
8  62184 33.83359891  0.00000  0.00000  0.00000              0  0.00000
9   9961  0.00000000 45.86969  0.00000  0.00000              0  0.00000
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341