Bag-of-Words (BoW) is a text model where word order, grammar, syntax, etc. are ignored and just the presence of words is considered. It's as if you took the words from a piece of text and plopped them into a bag and shook it up scrabble style. This is also called the Naive Bayes assumption because it looks at words probabilities irrespective of order; but I find this confusing given the Naive Bayes machine learning model. These are different things that share a name. The BoW model is used in text classification and information retrieval application to name a couple.
In the most general case, we start with a training corpus of positive (of the class you're looking for) and negative documents (not of the class) for training. The corpus is examined and every unique word (symbol) in the corpus is identified. This symbol list is called the feature set. Using the feature set, a vector is generated that represents each document in the training corpus. The vector consists of either a binary value (feature present/not in the document) or a number (frequency of the feature in the training set). These vectors are a BoW representation of the corpus and can be used to train a model such a an SVM model. Once a model is trained, vectors can be generated from documents "in the wild" and the model can be used to classify the document as being representative of the positive or negative class with a particular likelihood.
With any substantial corpus it's typical to have 10's of thousands, 100's of thousands and even millions of unique symbols. To get high classification performance, a process known as Dimensionality Reduction or Feature Reduction is performed. Feature reduction seeks to eliminate the symbols that are least effective at classifying; leaving only the most relevant features to consider. As an example, the word "the" will appear in almost all text and so is of no value in separating documents into classes. The word "football" would be of high value in sorting documents into those related to sports and those not related to sports. Dimensionality Reduction is a deep subject all by itself. Here's another Stack question where it's addressed in a bit of detail.
There are other variations such as the use of N-Grams (N consecutive words as a single symbol). Google Bag of Words Text Classification and you will find many academic papers, blog posts, books, etc. that describe this technique in much greater detail and explore the many aspects of optimizing performance for a variety of applications. There are also many tools for most any language that simplify the implementation of a BoW text classifier. Google your language of choice and bag of words. I hope this gets you started.