4

Sorry if this has been answered before, I'm not even sure how to search for it. I'm happy with any automated solution in R, VBA, or SPSS.

I have a huge set of demographic data like this:

ID <- c(1, 2, 3, 4, 5)
State <- c("FL", "FL", "FL", "FL", "FL")
County <- c("Lake", "Lake", "Lake", "Orange", "Orange")
Household <- c (2, 1, 3, 2, 1)
First.Gender <- c("Male", "Female", "Male", "Female", "Male")
Second.Gender <- c("Male", "-", "Female", "Female", "-")
Third.Gender <- c("-", "-", "Male", "-", "-")

Gender_Example <- data.frame(ID, State, County, Household, First.Gender, Second.Gender, Third.Gender)

and I'd like to find a way to create new rows based on what's in the column (without creating blank rows). Something that looks like this:

ID_i <- c(1, 1, 2, 3, 3, 3, 4, 4, 5) # _i designates my ideal set
State_i <- c("FL", "FL", "FL", "FL", "FL", "FL", "FL", "FL", "FL")
County_i <- c("Lake", "Lake", "Lake", "Lake", "Lake", "Lake", "Orange", "Orange", "Orange")
Household_i <- c(2, 2, 1, 3, 3, 3, 2, 2, 1)
Gender_i <- c("Male", "Male", "Female", "Male", "Female", "Male", "Female", "Female", "Male")

Gender_ideal <- data.frame(ID_i, State_i, County_i, Household_i, Gender_i)

If this has already been asked then I'd be happy just have a link. Thank you!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
ashurtz
  • 43
  • 5
  • 2
    Welcome to SO! Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Do **not** use images to show your data. – jogo Feb 21 '18 at 15:01
  • In R, load the "reshape2" package or "data.table" and use `melt`. Search for "reshape wide to long" to find many solutions. If blank rows are created, they're easy to remove. – A5C1D2H2I1M1N2O1R2T1 Feb 21 '18 at 15:11
  • In SPSS, you're going to want to look at `varstocases`. – A5C1D2H2I1M1N2O1R2T1 Feb 21 '18 at 15:12
  • And see [this answer](https://stackoverflow.com/a/19532976/1270695) for Excel, using the Tableau plugin for reshaping data. – A5C1D2H2I1M1N2O1R2T1 Feb 21 '18 at 15:23
  • Thank you, everyone. I changed it to R to satisfy requirements about displaying questions. Sorry, I'm new to both R and SO – ashurtz Feb 21 '18 at 15:24
  • The desired outcome is to separate rows into individual observations, instead of having multiple persons included in one row. So instead of having a row say "there's two people in this household, one is male and the other female" I would have one row that says "a male lives in a two-person household" and a seperate row that say "a female lives in a two-person household" while still copying the other data associated with them into the new rows – ashurtz Feb 21 '18 at 15:33

2 Answers2

3

R

In R, your best choices are going to be melt from "data.table" (which lets you use "patterns" to identify your measure variables. With that, you would do:

library(data.table)
melt(setDT(Gender_Example), measure.vars = patterns("Gender$"))[value != "-"]

Alternatively, there's the "tidyverse" approach.

library(tidyverse)
Gender_Example %>%
  gather(variable, value, ends_with("Gender")) %>%
  filter(value != "-")

SPSS

In SPSS, you would want to look at varstocases. There's a pretty good writeup here that should help you get started.


Excel

This might depend on the version of Excel you're using. If you are using 2016, you can use the pivot table wizard on your data, and then double click on the grand totals to access the underlying "long" table that would have been used to create the table.

The process is outlined at this video.

Alternatively, you can use the Tableau reshaping tool as described at this video

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

This might be of help

library(reshape2)

Gender_ideal <- melt(Gender_Example, id=c(names(Gender_Example)[1:4]))
rows.to.remove <- which(Gender_ideal$value == "-")
Gender_ideal <- Gender_ideal[-rows.to.remove,]
Gender_ideal
   ID State County Household      variable  value
1   1    FL   Lake         2  First.Gender   Male
2   2    FL   Lake         1  First.Gender Female
3   3    FL   Lake         3  First.Gender   Male
4   4    FL Orange         2  First.Gender Female
5   5    FL Orange         1  First.Gender   Male
6   1    FL   Lake         2 Second.Gender   Male
8   3    FL   Lake         3 Second.Gender Female
9   4    FL Orange         2 Second.Gender Female
13  3    FL   Lake         3  Third.Gender   Male
niko
  • 5,253
  • 1
  • 12
  • 32