-1

I asked this question previously, and Frank answered it here. Original question:

I would like to count islands along rows in a .csv. I say "islands" meaning consecutive non-blank entries on rows of the .csv. If there are three non-blank entries in a row, I would like that to be counted as 1 island. Anything less than three consecutive entries in a row counts as 1 "non-island". I would then like to write the output to a dataframe:

I slightly changed the input .csv to now include multiple islands/gaps, such that rows were not simply an either "island" row or "non-island" row. Does anyone have any advice?

Input .csv:

Name,,,,,,,,,,,,,
Michael,,,1,1,1,,,,1,,,,
Peter,,,,1,1,,,,,,,,,
John,,,,,1,,,,,,,,,
Erin,,,,,1,1,,,,1,1,,,

Desired dataframe output:

Name,island,nonisland,
Michael,1,1,
Peter,0,1,
John,0,1,
Erin,0,2
Community
  • 1
  • 1
agrobins
  • 109
  • 1
  • 7
  • 1
    Look at the `rle` function. You really _ought _ to post the input code, doncha' think? – IRTFM Jun 05 '15 at 02:53
  • 1
    Have a look at [how to make an R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's best if you provide your data in a way that people can just copy and paste into R without any issues. `dput` is a great way to do that. This just requires a slight modification of the code in the previous answer: `sapply(apply(df, 1, rle), function(x) sum(x$lengths[!is.na(x$values)] < 3))` – Jota Jun 05 '15 at 03:05
  • thank you for the comments, I'm new to stack overflow and coding in general. i'll use dput in the future to make it easy on users willing to help. Frank, this solution works- I appreciate your help! – agrobins Jun 05 '15 at 18:22

1 Answers1

1

Adding to the code from the previous question with a slight modification to get the nonisland column

# sample data
df <- read.csv(text="
,,,,,,,,,,,,,
Michael,,,1,1,1,,,,1,,,,
Peter,,,,1,1,,,,,,,,,
John,,,,,1,,,,,,,,,
Erin,,,,,1,1,,,,1,1,,,")

output <- stack(sapply(apply(df, 1, rle), 
            function(x) sum(x$lengths >= 3)))

output$nonisland <- sapply(apply(df, 1, rle), 
                      function(x) sum(x$lengths[!is.na(x$values)] < 3))

names(output) <- c("island", "names", "nonisland")

#  values   names nonisland
#1      1 Michael         1
#2      0   Peter         1
#3      0    John         1
#4      0    Erin         2
Jota
  • 17,281
  • 7
  • 63
  • 93