0

I have a huge (30mb) text document of random usernames. It looks like this:

  • a0hrszq13k
  • a0huod_cv4q
  • a0hxyaszqfk
  • a0hz
  • a0i5dk349

So on and so forth...

I want to sort these so that it will show me a list of names that are below a certain number of characters. For example let's say I wanted to know all the names that have less than 5 characters, using the small segment of data above, I would know that the only answer would be "a0hz". How can I get R to compute this itself and display the results?

First time I've asked a question so feel free to ask a follow-up question if this is unclear.

Bluetower
  • 33
  • 1
  • 4
  • Welcome to Stack Overflow! You'll want to check out this post on reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – rsoren Aug 09 '14 at 07:44

1 Answers1

0

Once you read in your data as a character vector (called names in my example), you can do it like this:

> names <- c("bill", "bernice", "bob", "beatriz")
> names[nchar(names) < 5]
[1] "bill" "bob" 
rsoren
  • 4,036
  • 3
  • 26
  • 37
  • But if I have thousands of names, would I have to put each name in parentheses? Or can I just do this: > names[nchar(names) < 5]? @Reed – Bluetower Aug 09 '14 at 07:46
  • No, you just need to read your data into R; at that point you'd use the solution I gave above. Here's a good site on importing data to R: http://www.r-tutor.com/r-introduction/data-frame/data-import. You'll want to look at ```read.table()``` in particular. If you need more help, feel free to post another question with an example of the text in your file – rsoren Aug 09 '14 at 07:51
  • I'm trying to reformat this my apologies.... My code I used `code> names = read.csv("b_2765605.csv") > names [nchar(names) < 4] data frame with 0 columns and 2765604 rows(There was already 2765604 rows in my original dataset so it looks like something went wrong)` – Bluetower Aug 09 '14 at 08:08
  • I'm sorry it's very difficult to learn how to position the text correctly... I even read the markdown doc and I'm messing it all up. – Bluetower Aug 09 '14 at 08:14
  • I don't mean to be rude, but this is pretty basic stuff in R. It's probably better for you to take a couple hours and work through an intro course, for example, tryr.codeschool.com. In the meantime, you could try this: ```names <- read.csv("b_2765605.csv", stringsAsFactors = FALSE)[, 1]; names[nchar(names) < 5]``` It will work if all of the usernames are in the first column of your CSV file. – rsoren Aug 09 '14 at 09:27