0

I want to delete line 1 to 94875 in my dataset (global terrorism database - kaggle) so that I can just work with the attacks between 2010 and 2017. Furthermore, I want to summarize/count all lines with the same year -> so in the end I can do a linear regression about the increase of attacks over the years.

dplyr::filter(globalterrorismdb_0718dist, grepl('1970', iyear))

I already tried this to filter the lines but won´t help me to move on.

fabla
  • 1,806
  • 1
  • 8
  • 20
Fabian
  • 3
  • 3

1 Answers1

0

you can also just do it directly on the dataframe:

# this will overwrite your existing dataframe...

# select only entries from row 94876 to end of table
your_df <- your_df[94876:nrow(your_df),]

# filter for timerange
your_df <- your_df[(your_df$iyear >= 2010 & your_df$iyear <= 2017),]

If you rename your_df next to <- you'll instead create a new dataframe with the same properties

LeroyFromBerlin
  • 395
  • 3
  • 12
  • how do i save it? that it won´t jump back to the common dataframe – Fabian Jan 20 '20 at 15:12
  • you can either overwrite your existing dataframe or create a new one.. I'll adjust my answer – LeroyFromBerlin Jan 20 '20 at 15:15
  • ok, that worked well! Now i saved each year between 2000 and 2017 with it´s own variable. How can i create a plot which has on the y-axis the number of attacks and on the x-axis the years? (i choosed from 2000-2017 the variables a-r) – Fabian Jan 21 '20 at 07:59
  • thats the code: >gf_point(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r~iyear, data = globalterrorism) – Fabian Jan 21 '20 at 08:07
  • That really depends on the kind of plot you are aiming for - easiest being `plot(df)`. This should rather be a separate question however. – LeroyFromBerlin Jan 21 '20 at 08:09