R spread issue: After adding new updated data, spread() creates a waterfall data frame

Question

I am trying to make a yearly data frame where the input data is updated monthly. It is historic by nature and changes usually occur only in the most recent years. I thought it would be easier to just add the updated information as opposed to re-adding duplicate information over and over (The data is updated monthly for yearly estimates and goes back to 1960 for supply and demand of world crops for each country). The starting data looks like:

Original Data Frame (FAS)

Corn_US_World_Model <- FAS %>% 
                       group_by(Market_Year)%>%filter(Commodity == "Corn")%>%
                       select(Attribute, Country, Market_Year, Thousand_MT)

Corn_US_World_Model_test<- Corn_US_World_Model %>%
                           group_by(Market_Year)%>%  
                           mutate(grouped_id = row_number())%>%
                           spread(Market_Year, Thousand_MT)%>% 
                           select(-grouped_id)

After spreading I end up with a waterfall like this

Waterfall with NA

What I would like is

Desired

When I filter() for a specific country I do not get a waterfall version. However, when I do it across all of the countries in the data set I do.

It shouldn't do that, but are you sure all of the names are exactly identical? Basically, what you get is what I'd expect if there were 2 rows for "Afghanistan", and 2 for "Afghanistan " (with a trailing space) — Emil Bode, Dec 10 '18 at 16:51
This is a situation where it's hard to help without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Like @EmilBode said, there might be a whitespace issue that we can't see without your actual data. — camille, Dec 10 '18 at 17:02
The function `dput` is most useful in these circumstances. `dput(FAS)` will give you a string that anyone can copy and paste to get the exact same data.frame, including "invisible" details such as whitespace. — Emil Bode, Dec 10 '18 at 17:04
Looking at the data you've shown, you have two entries for Afganistan exports in 2007 (500 and 550). How are you deciding which figure to show in the final output? I think you should summarise your data first to have just one entry per attribute/country/year combination. — Chris, Dec 10 '18 at 18:00

R spread issue: After adding new updated data, spread() creates a waterfall data frame

0 Answers0