1

I would like to solve this either in Excel or in SPSS:

I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)

enter image description here

eli-k
  • 10,898
  • 11
  • 40
  • 44
MSBSresearch
  • 53
  • 1
  • 4

2 Answers2

1

For SPSS:

First just creating some sample data to demonstrate on:

data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.

Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.

do repeat vr=diag1 to diag9/vl=1 to 9.
    compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.

Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.

eli-k
  • 10,898
  • 11
  • 40
  • 44
  • Hello, this would be ideal. I do not have much experience with the SPSS syntax. How do I define the data (the first line of syntax you have written)? – MSBSresearch Dec 30 '19 at 17:17
  • assuming your string has the name that shows in the picture (in your post) you can just open a new syntax window, past the second block of code into it, mark it all and run it. And get acquainted with syntax soon - it is well worth it :) – eli-k Dec 30 '19 at 17:20
  • It worked :) it was "cerv" in the picture and "serv" in your syntax. Thank you and Happy New Year!!! – MSBSresearch Dec 30 '19 at 17:26
1

Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads

Then you can import your table of data, here's how to do that: How to import an Excel file into SQL Server?

Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:

SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'

It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.

Plato77
  • 422
  • 3
  • 15