from nltk.tokenize import RegexpTokenizer
s = "Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\n\nThanks."
tokenizer = RegexpTokenizer('\w+|\$[\d\.]+|\S+')
tokenizer.tokenize(s)
Would this code be considered O(n)?
Based on what I read from the NLTK documentation, "a RegexpTokenizer
splits a string into substrings using a regular expression". I'm assuming that using a regular expression to mach against a string would be O(1), and then splitting the string into substrings with tokenizer.tokenize(s) would be O(n) where n is the number of characters in the input. Thank you for any clarification.