If you have a predefined vocabulary with potentially multiple terms selected simultaneously - e.g. a set of non-mutually-exclusive tweet
categories that you are interested in - then you can have a binary vector in which each bit represents one of the categories.
If the categories are mutually exclusive then what could you hope to achieve by clustering? Specifically there would be no "gray area" in which some observations belong to CategorySet-A, others to CategorySet-B and others to some in-between combination. If every observation is hard-capped at one category than you have discrete points not clusters.
If instead you wish to cluster based on similar sets of words - then you might need to know the "vocabulary" up-front - which in this case means: "what are the tweet terms that I care about". In that case you can use a bag of words
model https://machinelearningmastery.com/gentle-introduction-bag-words-model/ to compare the tweets - and then cluster based on the generated vectors.
Now if you are uncertain of the vocabulary apriori - which is the likely case here since you do not know what would be the content of the next tweet - then you will likely resort to re-clustering on a regular basis - as you gain new words. You can then use an updated bag of words
that includes the newly "seen" terms. Note that this incurs processing cost and latency. To avoid the cost/latency you have to decide ahead of time which terms to restrict your clustering on: which may be possible if you're interested in a targeted subject.