Creating a variable based on a word count within a variable

Question

I have a data set containing countries and their constitutions. I was wondering if there was a way to create a variable to show how many times the word "god" shows in the variable of constitutions.

The data set looks as following:

Country Year Preamble
Afghanistan 2004 In the name of Allah...
Albania 1998 We, the people of Albania...
... .... .......

and so on and so forth. I am particularly interested in knowing if there is a function in which can count how many times a specific word is used within a categorical variable or if there is a better way to accomplish what I am trying to do.

Welcome to SO. Click on the "r" below your question. Go to the "info" tab and read up on some expectations for how to specify questions including sample data. That should help you help others to help you. Further, this is a pretty broad question and you've kind of asked 2 questions in one, so you might want to consider paring it down a bit and include said sample data (or link to a sample on a reputable file sharing site or GitHub/GitLab). — hrbrmstr, Nov 25 '18 at 21:09
Sorry, I really try to make it clear. I am very new to coding and have a hard time describing what I am trying to accomplish. Do you have any advice as to how I could make it more specific? — Ian Chamberlin, Nov 25 '18 at 21:21
What you need to do is make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In particular, share an example of your data using `dput(your_df)`. Please edit the question with this info (don't post the data in comments) — Conor Neilson, Nov 25 '18 at 21:24
@IanChamberlin I'll make it more direct. The process I asked you to follow goes to here: https://stackoverflow.com/tags/r/info. One of those links is what Conor provided. Please do some fundamental reading to learn how to make it easier for folks to help you. — hrbrmstr, Nov 25 '18 at 21:37

morgan121 · Answer 1 · 2018-11-25T22:32:46.477

0

Say you want to count the number of times 'Al' appears in the above dataset, you can use grep like this:

For only one column:

 grep("Al", data$Preamble)

For all columns:

 lapply(data, function(x) grep("Al", x))
  $`Country`
  [1] 2

  $Year
  integer(0)

  $Preamble
  [1] 1 2

This will tell you in which rows and columns the match is found, ie one in the 'Country' column and two in the 'Preamble' column

edited Nov 25 '18 at 22:32

answered Nov 25 '18 at 21:48

morgan121

2,213
1
15
33

Creating a variable based on a word count within a variable

1 Answers1