0

I would like to change the format of my data for some specific code that I am working on. Below are the first 50 observations and the format it is in, each individual has its own line with the observation number, species, length (mm), weight (kg), and mesh size of the net it was caught in (in inches).

fish_data <- read.table(header = T,
text = "Index Species Length  Weight  mesh

1   SVCP    450     1.26    4

2   SVCP    584     2.24    3

3   SVCP    586     2.46    3

6   SVCP    590     2.4     3

7   SVCP    590     2.04    3

8   SVCP    594     2.62    3

9   SVCP    595     2.24    3

10  SVCP    595     2.04    3

11  SVCP    596     2.46    3

12  SVCP    603     2.6     3

13  SVCP    603     2.44    3

14  SVCP    604     2.68    3

15  SVCP    604     2.48    3

16  SVCP    606     2.06    3

17  SVCP    609     3.74    5

18  SVCP    609     2.44    3

20  SVCP    611     2.56    3

30  SVCP    618     2.52    3

31  SVCP    620     2.66    3

32  SVCP    620     2.66    3

33  SVCP    621     2.72    3

34  SVCP    625     2.8     3

36  SVCP    625     2.08    3

37  SVCP    626     2.74    3

38  SVCP    627     2.09    3

39  SVCP    627     2.82    3

40  SVCP    628     2.8     3

41  SVCP    630     2.68    3

42  SVCP    630     2.82    3

43  SVCP    637     3       3

45  SVCP    639     2.54    3

47  SVCP    640     3.01    3

49  SVCP    643     3.36    3

50  SVCP    644     6.82    4.25")

I would like to change the format to something like this below. Where the first column is the mesh size of the net, and the subsequent columns are the number of observations in specific length bin (for example 101-105mm, 106-110mm, 111-115 mm... ect.). I will be using 10 mm length bins.

52.5  52  11   1   1   0   0   0   0

54.5 102  91  16   4   4   2   0   3

56.5 295 232 131  61  17  13   3   1

58.5 309 318 362 243  95  26   4   3

60.5 118 173 326 342 199 100  10  11

62.5  79  87 191 239 202 201  39  15

64.5  27  48 111 143 133 185  72  25

66.5  14  17  44  51  52 122  74  41

68.5   8   6  14  23  25  59  65  76

70.5   7   3   8  14  15  16  34  33

72.5   0   3   1   2   5   4   6  15
Nate
  • 10,361
  • 3
  • 33
  • 40
  • 1
    Please review how to share your data in a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) format – Conor Neilson Nov 20 '18 at 23:58
  • 1
    It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed? – Taher A. Ghaleb Nov 21 '18 at 00:05
  • They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net. – fishy_stats Nov 21 '18 at 00:12
  • hey fishy welcome to stack. next time you post a question use that `read.table()` pattern to share data – Nate Nov 21 '18 at 00:21
  • `hist(..., plot = FALSE)` will put your data into histogram bins. Specify `breaks = c(...)` for your bin intervals – Scransom Nov 21 '18 at 00:23

1 Answers1

0

Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.

library(tidyverse)
fish_data %>%
  mutate(Length_bin = (floor(Length / 5) * 5)) %>%
  count(mesh, Length_bin) %>%
  spread(Length_bin, n, fill = 0)

# A tibble: 4 x 15
#   mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  3        0     1     1     3     3     4     2     1     1     3     6     2     2     2
#2  4        1     0     0     0     0     0     0     0     0     0     0     0     0     0
#3  4.25     0     0     0     0     0     0     0     0     0     0     0     0     0     1
#4  5        0     0     0     0     0     0     1     0     0     0     0     0     0     0
Jon Spring
  • 55,165
  • 4
  • 35
  • 53