0

For my master thesis I am analysing a food security model and the next element i need to obtain is the number of crisis transitions that have taken place in the researched period. A crisis transition is when the food security IPC values go from either 1 or 2 to 3,4 or 5 during the standard forecasting period(which initially was 3 months and later on 4 months, but that aside). So I would like to count the times that an area went from 1 or 2 to 3,4 or 5. I have a long dataframe that has a column with the period, the area(Livelihood zone) and the IPC value. I put the link to two csv files for you guys to download and check for yourselves.

What do you guys think is the best way to obtain this count per type of area? Let me know if you need additional information. I hope you guys can help, that would mean a lot!

Dput output of the first 48 rows, which means two periods and two times all the areas:

structure(list(`Livelihood zone` = c("Central Highlands, High Potential Zone", 
"Marsabit Marginal Mixed Farming Zone", "Northwestern Agropastoral Zone", 
"Southeastern Marginal Mixed Farming Zone", "Turkwell Riverine Zone", 
"Western High Potential Zone", "Tana Riverine Zone", "Southeastern Medium Potential, Mixed Farming Zone", 
"Northern Pastoral Zone", "Western Medium Potential Zone", "Western Lakeshore Marginal Mixed Farming Zone", 
"Southern Pastoral Zone", "Northeastern Pastoral Zone", "Mandera Riverine Zone", 
"Eastern Pastoral Zone", "Northeastern Agropastoral Zone", "Lake Turkana Fishing", 
"Lake Victoria Fishing Zone", "Western Agropastoral Zone", "Coastal Medium Potential Farming Zone", 
"Coastal Marginal Agricultural Mixed Farming Zone", "Southeastern Pastoral  Zone", 
"Northwestern Pastoral Zone", "Southern Agropastoral Zone", "Central Highlands, High Potential Zone", 
"Marsabit Marginal Mixed Farming Zone", "Northwestern Agropastoral Zone", 
"Southeastern Marginal Mixed Farming Zone", "Turkwell Riverine Zone", 
"Western High Potential Zone", "Tana Riverine Zone", "Southeastern Medium Potential, Mixed Farming Zone", 
"Northern Pastoral Zone", "Western Medium Potential Zone", "Western Lakeshore Marginal Mixed Farming Zone", 
"Southern Pastoral Zone", "Northeastern Pastoral Zone", "Mandera Riverine Zone", 
"Eastern Pastoral Zone", "Northeastern Agropastoral Zone", "Lake Turkana Fishing", 
"Lake Victoria Fishing Zone", "Western Agropastoral Zone", "Coastal Medium Potential Farming Zone", 
"Coastal Marginal Agricultural Mixed Farming Zone", "Southeastern Pastoral  Zone", 
"Northwestern Pastoral Zone", "Southern Agropastoral Zone"), 
    `Period of measurement Kenya` = c("2011-01", "2011-01", "2011-01", 
    "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", 
    "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", 
    "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", "2011-01", 
    "2011-01", "2011-01", "2011-01", "2011-04", "2011-04", "2011-04", 
    "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", 
    "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", 
    "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", "2011-04", 
    "2011-04", "2011-04", "2011-04"), `IPC class` = c(1, 3, 2, 
    2, 2, 1, 2, 2, 3, 1, 1, 2, 3, 3, 2, 3, 2, 1, 2, 2, 2, 2, 
    2, 2, 1, 3, 2, 2, 2, 1, 2, 2, 3, 1, 1, 2, 3, 3, 2, 3, 2, 
    1, 2, 2, 2, 2, 2, 2)), row.names = c(NA, 48L), class = "data.frame")

For the outcome I would like to have a dataframe which has a count of the crisis transitions per livelihood zone. Thanks in advance!

Mathijs-ve
  • 29
  • 7
  • 1
    Can you share a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data using `dput` ? Based on that show desired outcome. This way it is easier to help you. – markus Jan 19 '20 at 12:40
  • 1
    It's much nicer if you share about 10 rows of data in your question in a copy/pasteable way (use `dput()` to generate the copy/pasteable version, e.g., `dput(your_data[1:10, ])`). Many people don't want to download strange files, put them in their working directory, load unknown amounts of data into their R session. And, the main idea for the site is that it becomes a good resource - if your google drive links ever go stale, the question becomes useless for future reference. So please share just enough data to illustrate your problem directly in the question. – Gregor Thomas Jan 19 '20 at 13:31
  • Okay thanks for the feedback! Still very new to Stackoverflow ;). Is it okay like this or would you like me to put more rows up? – Mathijs-ve Jan 19 '20 at 13:35
  • 2
    I'd rather have fewer rows, but rows that show a crisis transition... a nice illustrative example would have, say, 5 rows each for 2 zones, with 2 or 3 crisis transitions. I think your data has 2 rows each for 24 zones with 0 crisis transitions. I'll put up an answer that I think works, but without any crisis transitions to test on I'm not 100% sure. – Gregor Thomas Jan 19 '20 at 13:52
  • 1
    Thanks again for the feedback! – Mathijs-ve Jan 19 '20 at 18:37

1 Answers1

0

I think this should work. If it doesn't work, please share an example with crisis transitions that are incorrect so I can debug.

library(dplyr)
df %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
  arrange(`Livelihood zone`, `Period of measurement Kenya`) %>%
  group_by(`Livelihood zone`) %>%
  summarize(crisis_trans_count = sum(diff(crisis) > 0))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I think it works! Thanks a lot. If I have more questions considering this subject, I'll post them here. – Mathijs-ve Jan 19 '20 at 18:37
  • One additional question: Is the code above only summing the 3 : 5 values that are coming after a 1 or 2 value? Because if it's summing all the 3:5 values it is not always a crisis transition. One new question: I also want to compare the IPC class column with a forecast column. If the crisis transition was incorrectly forecasted I want that to be summed in a new column by livelihood zone. What condition within the ifelse function would you think is correct? Up until now I've come up with ```ifelse (`IPC class` %in% 3:5 != `Forecast` %in% 3:5, 1, 0)```. Let me know and thanks in advance! – Mathijs-ve Jan 21 '20 at 13:24
  • 1
    Q1: That's my intent, but again, it's untested. You should test it yourself to make sure (you should test all code, whether it's code your write or code you find on Stack Overflow). Make yourself a test case where the IPC class goes something like `4, 5,1, 2, 1, 3, 4, 1, 5` and make sure that the answer you get is what you expect. (More benefits of the minimal reproducible example: makes a great test case.) Q2, that looks right, based on your brief description. Again, do a small test. – Gregor Thomas Jan 21 '20 at 14:00
  • 1
    You don't need to use backticks for regular variable names like `Forecast`. Only "non-standard" variable names that have spaces or punctuation other than `.` or `_` need backticks. – Gregor Thomas Jan 21 '20 at 14:00
  • Okay I'll check these tips! – Mathijs-ve Jan 22 '20 at 15:36
  • About the condition I suggested above; apparently it is also putting a 1 when the IPC class is not between 3:5 and and the forecast is between 3:5. This is not what I want, as this means that there was no crisis transition in the IPC class column. Do you know what condition I can insert to make sure that it first checks if the IPC class column is within the 3:5 range and then after that checks if it is the same as the forecast column? :) Let me know if I need to clarify more! Thanks in advance! – Mathijs-ve Feb 07 '20 at 16:19
  • 1
    I don't think I really understand your forecast comparison logic---you might need to ask a new question with a nice illustrative example and a better explanation. I think you may need an `&` in there, something like `ifelse(\`IPC class\` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)`, but that's just comparing the values, not the transitions. – Gregor Thomas Feb 07 '20 at 20:27
  • 1
    At first sight it did the trick, but it's still not correct. I'll make a new question with a illustrative example and explain it more properly. – Mathijs-ve Feb 10 '20 at 11:04