0

I have a dataset that looks like this:

> head(data)
       phrase count per.million ratio variety clitic.type pronoun
1    ME SINTO 12553        15.0   2.4      BP   proclitic      me
2      ME DEU  8911        10.7   2.1      BP   proclitic      me
3    ME DISSE  7260         8.7   1.8      BP   proclitic      me
4   ME PARECE  5397         6.5   0.5      BP   proclitic      me
5    ME SENTI  4883         5.8   2.1      BP   proclitic      me
6      ME FEZ  4439         5.3   1.2      BP   proclitic      me

where count is the total number of times that the phrase appeared in a corpus. I want to run logistic regression on the data with lrm(), but doing so currently simply counts the number of rows in my dataset rather than using the values within count.

I need to massage the data in a way where I have a separate line for each occurrence of the phrase (so 12553 lines for ME SINTO). I have been using melt() to try to do this as follows,

meltdata <- melt(data, id=c("phrase", "variety", "clitic.type", "pronoun", "per.million", "ratio"))

but this does not separate the count variable in the way I need, as shown below.

> head(meltdata)
       phrase variety clitic.type pronoun alt.count per.million ratio variable value
1    ME SINTO      BP   proclitic      me      2161        15.0   2.4    count 12553
2      ME DEU      BP   proclitic      me      1746        10.7   2.1    count  8911
3    ME DISSE      BP   proclitic      me      1681         8.7   1.8    count  7260
4   ME PARECE      BP   proclitic      me      4618         6.5   0.5    count  5397
5    ME SENTI      BP   proclitic      me       949         5.8   2.1    count  4883
6      ME FEZ      BP   proclitic      me      1467         5.3   1.2    count  4439
smag9467
  • 13
  • 3

0 Answers0