0

I have a data frame listing names, number of names in a specific year. When I subset this to find a specific name, say James, I cannot plot the subset. It is from a dataframe with one column listing names (thousands of them), one listing years, one listing gender (M or F), and one listing number. I split it by gender too. The main dataframe is called df1.

Here is the fist ten lines from the df1. No column is called years...

        Name  Gender  Number   Date
1       Mary  F       7065     ob1880  
2       Anna  F       2604     ob1880  
3       Emma  F       2003     ob1880  
4  Elizabeth  F       1939     ob1880  
5     Minnie  F       1746     ob1880  
6   Margaret  F       1578     ob1880  
7        Ida  F       1472     ob1880  
8      Alice  F       1414     ob1880  
9     Bertha  F       1320     ob1880  
10     Sarah  F       1288     ob1880  

df.james = subset(df1,df1 =="James")
df.split = split(df.james,df.james$Gender)
df.male = df.split$M

tbl = table(df.male) #this is the bit that doesn't work.

I get the following error:

Error in vector("integer", length) : vector size cannot be NA
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In bin + pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
3: In pd * nl : NAs produced by integer overflow

Also, when I try to tabulate two columns from that subset, it seems to include lots of values from the original data frame.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Brockagh
  • 31
  • 3
  • 2
    Where is the code you are actually running? Please take the time to create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can see exactly what is causing your problem. – MrFlick Oct 19 '14 at 01:05
  • Thanks MrFlick. I added code now. – Brockagh Oct 20 '14 at 07:45
  • A reproducible example would also include a sample data set that would produce the same error that you are getting. I have no idea what's in `df1` so it's hard to say what might be doing on. Do you really also have a column named "df1" in the data.frame "df1"? – MrFlick Oct 20 '14 at 17:13
  • Apologies. I hope this is clearer. – Brockagh Oct 21 '14 at 11:55
  • If that's what your data looks like, then the line `df.james = subset(df1,df1 =="James")` seems wrong because you don't have a column named `df1`. I would expect `df.james = subset(df1, Name =="James")` to be the correct syntax. – MrFlick Oct 21 '14 at 15:19

0 Answers0