I have a data frame listing names, number of names in a specific year. When I subset this to find a specific name, say James
, I cannot plot the subset. It is from a dataframe with one column listing names (thousands of them), one listing years, one listing gender (M or F), and one listing number. I split it by gender too. The main dataframe is called df1.
Here is the fist ten lines from the df1. No column is called years...
Name Gender Number Date
1 Mary F 7065 ob1880
2 Anna F 2604 ob1880
3 Emma F 2003 ob1880
4 Elizabeth F 1939 ob1880
5 Minnie F 1746 ob1880
6 Margaret F 1578 ob1880
7 Ida F 1472 ob1880
8 Alice F 1414 ob1880
9 Bertha F 1320 ob1880
10 Sarah F 1288 ob1880
df.james = subset(df1,df1 =="James")
df.split = split(df.james,df.james$Gender)
df.male = df.split$M
tbl = table(df.male) #this is the bit that doesn't work.
I get the following error:
Error in vector("integer", length) : vector size cannot be NA
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In bin + pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
3: In pd * nl : NAs produced by integer overflow
Also, when I try to tabulate two columns from that subset, it seems to include lots of values from the original data frame.