I have a dataset that looks like this:
> head(data)
phrase count per.million ratio variety clitic.type pronoun
1 ME SINTO 12553 15.0 2.4 BP proclitic me
2 ME DEU 8911 10.7 2.1 BP proclitic me
3 ME DISSE 7260 8.7 1.8 BP proclitic me
4 ME PARECE 5397 6.5 0.5 BP proclitic me
5 ME SENTI 4883 5.8 2.1 BP proclitic me
6 ME FEZ 4439 5.3 1.2 BP proclitic me
where count is the total number of times that the phrase appeared in a corpus. I want to run logistic regression on the data with lrm(), but doing so currently simply counts the number of rows in my dataset rather than using the values within count.
I need to massage the data in a way where I have a separate line for each occurrence of the phrase (so 12553 lines for ME SINTO). I have been using melt() to try to do this as follows,
meltdata <- melt(data, id=c("phrase", "variety", "clitic.type", "pronoun", "per.million", "ratio"))
but this does not separate the count variable in the way I need, as shown below.
> head(meltdata)
phrase variety clitic.type pronoun alt.count per.million ratio variable value
1 ME SINTO BP proclitic me 2161 15.0 2.4 count 12553
2 ME DEU BP proclitic me 1746 10.7 2.1 count 8911
3 ME DISSE BP proclitic me 1681 8.7 1.8 count 7260
4 ME PARECE BP proclitic me 4618 6.5 0.5 count 5397
5 ME SENTI BP proclitic me 949 5.8 2.1 count 4883
6 ME FEZ BP proclitic me 1467 5.3 1.2 count 4439