Dataset is baffling me

Question

I'm trying to manipulate the following dataset (From FBI crime statistics) https://www.ucrdatatool.gov/Search/Crime/Local/RunCrimeJurisbyJuris.cfm . It is in .CSV format. Once downloaded, using R, I've used the following commands

a=read.csv("RunCrimeJurisbyJuris.csv",header=FALSE);

Then, we remove the stuff at the top, and the n/a at the end.

b=a[-c(1:5),-c(24,25)];

Which, when viewed, looks proper. For example, b[1,] produces the list of the first row, as it should However, when I try to name headers, for example,

 names(b)=b[1,],

Produces what I THINK is a list of the levels. Why is it doing this?

I get some very confusing stuff going on. I think this is due to when I look at, for example, b[1,1], instead of just getting "Year", I get

Year
41 Levels: ...

In addition, using view(b) produces an excel like representation that looks like a normal data set. It's been awhile since I've used R, and if I recall correctly, I've never seen this behavior before. In addition, I think these "Levels" are the source of the error. What am I doing wrong?

ABOVE IS SOLVED

Now, when I pull members, say, b["Population"], each element looks like this (number)" ". Is there a way to remove these " ", and if i pull a specific number, say b[3,2], it has the form "number". This dataset is quite frustrating (: .

Levels refers to a factor variable; it means that some variables have been read in as categorical factors, rather than numeric, variables. — joran, Mar 21 '19 at 16:29
Possible duplicate of [How to convert a factor to integer\numeric without loss of information?](https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information) — DanY, Mar 21 '19 at 16:30
try to put `stringsAsFactors = FALSE` in your `read.csv` command — Mike, Mar 21 '19 at 16:30
@RLave This is intended, as the 24th/25th column are just a bunch of N/As when I imported it, and the first 5 rows are just unneeded information — Shinaolord, Mar 21 '19 at 16:31
@Mike Alright, I've never ran into this issue, even importing .csv-s with Strings in them. Very Interesting, I will report back momentarily — Shinaolord, Mar 21 '19 at 16:32
@Mike that worked, I was able to get the headers defined. Thanks! — Shinaolord, Mar 21 '19 at 16:37
It actually isn't entirely fixed, updating question at the moment. — Shinaolord, Mar 21 '19 at 16:47
@Shinaolord would you be able to share the dataset using dput instead of the link above — Mike, Mar 21 '19 at 17:11
I will try to add it, I made it, not sure if I can upload files, but if not I'll use my github. My only issue now is removing NAs. I figured out how to as.numeric everything, should I just ask it in a new question? Your original suggestion did answer my original question, after all. — Shinaolord, Mar 21 '19 at 17:14

score 0 · Accepted Answer · answered Mar 21 '19 at 18:08

The solution involves the following::

First, to remove the "levels" part, we need to not make the strings as factors. Hence, we add the following command:

read.csv("file.csv",header=FALSE,stringsAsFactors=FALSE)

Then, we have the issue of everything being strings, even the numbers( at least I did). I fixed this using the following loop, and sapply()

for(i in 1:ncol(test2)){test2[,i]=sapply(test2[,i],as.numeric)};

Using as.numeric to convert the strings to numbers. Then, we can replace the NA's by using a loop obtained from Replacing Missing Values with Column Mean, which will only affect test on, say, statistical significance, or the production of confidence intervals. This is the loop stated in that quesiton, it's pretty easy to understand:

for(i in 1:ncol(test2)){ test2[is.na(test2[,i]),i]=mean(test2[,i],na.rm=TRUE)};

And, we're done!

Dataset is baffling me

1 Answers1