0

I have a dataset which has 2 columns which I need to combine:

butterfly <- read.csv("Butterfly_Data_All.csv")

Red_Admiral_data <- butterfly[butterfly$Species == 'Red Admiral',]

pop_RA <- Red_Admiral_data$SINDEX` # where `SINDEX` is population index

summer_RA <- Red_Admiral_data$Average_Temp_May_June_July

winter_RA <- Red_Admiral_data$Average_Temp_Nov_Dec_Jan

summer_RA consists of the temperatures over 3 months in the summer, per observation of the butterfly species 'Red Admiral':

13.39 12.24 12.32 12.11 12.41 12.21 12.28 11.83 11.88 11.73 11.99
11.75 14.91 14.83 14.43 14.91 13.46 14.99 14.56 15.04 13.70 11.10
16.04 14.34 15.02 14.30 15.17 14.55 12.82 14.34 13.32 15.32 13.97
14.64 10.27 15.26 14.94 14.22 14.82 14.82 15.15 14.88 14.77 12.64

and winter_RA consists of the temperature over 3 months in the winter.

5.25  5.33  5.31  5.00  5.34  5.24  4.70  7.04  7.03  6.72  7.06
6.30  5.29  5.24  5.82  5.22  5.76  6.56  5.08  5.33  7.15  4.67
5.77  6.58  4.84  6.80  5.06  5.14  6.49  5.80  6.86  5.20  5.54
4.85  3.27  6.29  5.32  5.47  4.78  4.78  5.19  5.05  5.12  4.89
5.25  5.33  5.31  5.00  5.34  5.24  4.70  7.04  7.03  6.72  7.06

The dataset is huge, 21540 entries ommitted, and when I use the merge() function (even with increased memory) it crashes out the software.:

merge(summer_RA,winter_RA)

I am wanting to plot the population index for this species (pop_RA) against the summer and winter temperatures combined but so far can only create a separate plot for each. Hope this makes sense.

dput(summer_RA) output is:

13.49, 12.25, 13.41, 13.18, 13.67, 13.12, 13.72, 13.16, 13.53, 14.01, 13.02, 9.91, 11.51, 13.05, 12.9, 12.32, 13.3, 13.3, 13.03, 13.03, 9.75, 13.13, 13.23, 13.03, 13.45, 10.52, 10.52, 13.64, 
8.26, 12.88, 13.79, 12.46, 9.31, 13.3, 13.98, 13.11, 13.3, 12.85

dput(winter_RA) output is:

4.67, 4.16, 4.51, 4.55, 4.68, 5.13, 4.3, 4.39, 4.16, 5.02, 4.29, 4.17, 4.61, 5.18, 6.15, 6.15, 4.34, 6.15, 5.39, 5.39, 6.01, 5.18, 4.78, 5.39, 4.39, 5.02, 5.02, 4.48, 5.04, 4.13, 3.73, 3.16, 3.36, 4.13, 4.13, 4.13, 4.13, 3.29, 3.39, 3.29, 3.79, 3.79, 3.79, 4.43

pop_RA sample data:

 10   1   2   0   5   0   2   0   0   0   4   0  31   1  27  22 17   3   2  21  33  17  11  21   1  20   4   8  11  13  53   6  51   3  41  43  40   7   7   0   0   8  11  15  22   9   1   0  33   4   0   5   3  15   5   0   1   6   0   0   1   6   1   1
Jose
  • 421
  • 3
  • 10
Remy Bear
  • 11
  • 2
  • 1
    Are you just dealing with two vectors, or are these pieces of a `data.frame`? I doubt that `merge` is the right logic here. Please provide the output from `dput(x)` where `x` is about this-size sample of `summer_RA` (and for `winter_RA` as well). I don't know how `pop_RA` fits into this, but perhaps we need a representative sample of that as well. – r2evans Dec 23 '21 at 14:18
  • Hi, I've edited the post with more info. Hope this is all ok, I'm very new to R so not sure if it's right, thanks – Remy Bear Dec 23 '21 at 14:33
  • 1
    Now that I know more of the data, `dput(summer_RA)` was not quite right, but your edit gave me a better idea of what we're dealing with: *one frame*. I don't know what you mean when you say you want to "combine" the two columns. Please edit your question and replace all of the other `dput` output with `dput(head(Red_Admiral_data,20))` (I used 20 rows, but really enough to play with). From there, please add what you expect out of this, *literally* ... a new column? a new frame? something else? – r2evans Dec 23 '21 at 14:51
  • Hi Remy Bear! Please edit your question as suggested by @r2evans. I think that your machine may be crashing because using `merge()` without any `id`column makes that one vector be recycled over the other, i.e., you are creating a data frame with every value of vector2 being recycled over all the values in vector1. `merge()` is commonly used with any common id column to avoid that behavior, e.g.: `merge(x, y, by = 'id', all.y = T)` where `id` is a column present in each vector `x` and `y` – Jose Dec 23 '21 at 15:07
  • 1
    Using `merge` on two columns of the same frame is completely misinterpreting the purpose of `merge`. For a discussion of "merge" (and "join") operations, see https://stackoverflow.com/q/1299871/3358272 and https://stackoverflow.com/q/5706437/3358272, and realize that I think you should not be doing that.. – r2evans Dec 23 '21 at 15:10
  • Where are Red Admirals sighted, geographically, in the winter? Just for my knowledge, Costa Rica? Thanks. – Chris Dec 23 '21 at 17:50

2 Answers2

1

I think there is a simple solution for this:

Let's say you create a "season" data combining winter and summer. You can use data.frame() to combine the two datasets, like this:

season <- data.frame(summer, winter)
Jose
  • 421
  • 3
  • 10
0

I'm going to guess that by "combine" you mean "plot together". It's just a guess.

I cannot use the sample data you provided, everything has different lengths (which is not correct for a data.frame), so I'll generate random data in the same "shape".

Red_Admiral_data <- data.frame(pop_RA = c(10,1,2,0,5,0,2,0,0,0,4,0,31,1,27,22,17,3,2,21,33,17,11,21,1,20,4,8,11,13,53,6,51,3,41,43,40,7,7,0,0,8,11,15,22,9,1,0,33,4,0,5,3,15,5,0,1,6,0,0,1,6,1,1))
nrow(Red_Admiral_data)
# [1] 64
set.seed(42)
Red_Admiral_data$summer_RA <- rnorm(64, 12.6, 1.39)
Red_Admiral_data$winter_RA <- rnorm(64, 4.53, 0.78)
head(Red_Admiral_data)
#   pop_RA summer_RA winter_RA
# 1     10  14.50563  3.962712
# 2      1  11.81507  5.545983
# 3      2  13.10475  4.791962
# 4      0  13.47968  5.340035
# 5      5  13.16193  5.248168
# 6      0  12.45249  5.092285

base R

plot(summer_RA ~ pop_RA, data = Red_Admiral_data, type = "p", pch = 16, col = "blue", ylim = c(0, 16))
points(winter_RA ~ pop_RA, data = Red_Admiral_data, pch = 16, col = "red")
legend("bottomright", c("summer_RA", "winter_RA"), col = c("blue", "red"), pch = 16, bg = "white")

base graphics scatterplot

ggplot2

ggplot2 really prefers data in a long format, so we'll first reshape. (Reshaping/pivoting is not always a trivial thing, look for other questions related to using the reshape2 package or tidyr::pivot_*.)

library(ggplot2)
ggplot(reshape2::melt(Red_Admiral_data, "pop_RA", variable.name = "season"), 
       aes(pop_RA, value)) +
  geom_point(aes(color = season))

ggplot2 scatterplot, same data

r2evans
  • 141,215
  • 6
  • 77
  • 149