Tidyr gather function is retaining the original wide dataset rows with placeholder data and placing the long data below them - fix?

Question

tidyr::gather() for whatever reason is saving the original rows of a wide dataset I am attempting to transform to long and then placing the long data below those rows. This is being applied to a dataframe read in from a csv.

my original data is setup like this but with 30+ columns for different gas species:

X      Date Treatment Carbon.dioxide.CO2_mean Methane.CH4_mean  ... 
1 10/2/2018      1A01                 14886.2         2.194333  ...             
2 10/2/2018      1A27                352313.8        18.034400  ...            
3 10/2/2018      1A35                112027.4         7.994200  ...            
4 10/2/2018      1A60                181449.2         5.270500  ...
...

This my code:

long.mean.myfiles.subset <- removed.mean.myfiles.subset %>% 
  gather(Gas_Species, Gas_Concentration_PPM, -c(Date, Treatment))

This is the output I expect:

#         Date Treatment                    Gas_Species Gas_Concentration_PPM
#1    10/2/2018      1A01        Carbon.dioxide.CO2_mean          1.488620e+04
#2    10/2/2018      1A27        Carbon.dioxide.CO2_mean          3.523138e+05
#3    10/2/2018      1A35        Carbon.dioxide.CO2_mean          1.120274e+05

This what I'm getting:

         Date Treatment                    Gas_Species Gas_Concentration_PPM
1   10/2/2018      1A01                              X          1.000000e+00
2   10/2/2018      1A27                              X          2.000000e+00
3   10/2/2018      1A35                              X          3.000000e+00
4   10/2/2018      1A60                              X          4.000000e+00
5   9/12/2018      1A01                              X          5.000000e+00
6   9/12/2018      1A27                              X          6.000000e+00
...
25  10/2/2018      1A01        Carbon.dioxide.CO2_mean          1.488620e+04
26  10/2/2018      1A27        Carbon.dioxide.CO2_mean          3.523138e+05
27  10/2/2018      1A35        Carbon.dioxide.CO2_mean          1.120274e+05
28  10/2/2018      1A60        Carbon.dioxide.CO2_mean          1.814492e+05

The original wide dataset has 24 rows not including the labels and roughly 40 columns (one for each gas species) and associated Date and Treatment. I wanted just the date and treatment then to create gas_species and concentration columns.

There are NA values I want to retain.

I was able to generate the correct output originally with this code before this issue occurred. I have wiped the GE, any RData, and RHistory. I restarted R and attempted again with no success. I can't seem to find any documentation of this issue elsewhere and was wondering if anyone knows why this is happening an how to fix it?

Without you posting a reproducible example using `dput()` it's really tough to understand what your problem is. Your output looks as expected to me, given your code. — dylanjm, Feb 26 '19 at 21:50
It looks like you have a column named `X` that contains the row names. I don't think you want this as part of the columns you are going to gather together, and I'm guessing it is causing the problem. — aosmith, Feb 26 '19 at 21:50
@dylanjm I'm confused by what your saying. My code is very clearly in the question under the section "This is my code". I have examples of my original df (removed.mean.myfiles.subset) and my outputs (the output I expected is the output I got previously before rerunning the code and the problem occurring). How can I clarify more? — Kaliber, Feb 26 '19 at 22:06
@Kaliber try from our perspective to recreate your problem. Start a new rscript file and use only what you have provided. It's not polite to think that people will not only answer your question but input the data manually into a data.frame to reproduce your problem. [Here is a good place to review how to make a reprex](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Not trying to be rude just helping you get better, more in-depth answers. — dylanjm, Feb 26 '19 at 22:11
@aosmith thank you it looks like that is the issue. R usually numbers the rows in the console but that is being carried over by one of my previous functions and being added as a new column if I output to a csv at any point. — Kaliber, Feb 26 '19 at 22:18
@dylanjm thank you for the additional resources I will do my best to review and implement them. My biggest R weakness is syntax. I have a hard time translating examples using dummy or seed data to something that will work for my purposes and data design which is why I'm not so great at reprexes. I should have specified that I have a poor R foundation and was looking for someone more versed in tidyr to review my dataframe and syntax for errors not to attempt to rewrite or even rerun my code at all. I will try to troubleshoot my syntax errors on my own before turning to SOF in the future. — Kaliber, Feb 26 '19 at 22:43
@Kaliber you don't need to generate fake data you can use the function `dput()` on `removed.mean.myfiles.subset` like `dput(removed.mean.myfiles.subset)` and then paste the output in your post and that will let us easily input the data into our R environments. — dylanjm, Feb 26 '19 at 22:44

Tidyr gather function is retaining the original wide dataset rows with placeholder data and placing the long data below them - fix?

0 Answers0