Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
636
votes
10 answers

How do I split a string into a list of words?

How do I split a sentence and store each word in a list? e.g. "these are words" ⟶ ["these", "are", "words"] To split on other delimiters, see Split a string by a delimiter in python. To split into individual characters, see How do I split a…
Thanx
  • 7,403
  • 5
  • 21
  • 12
178
votes
17 answers

How to get the first word of a sentence in PHP?

I want to extract the first word of a variable from a string. For example, take this input: The resultant output should be Test, which is the first word of the input. How can I do this?
ali
  • 1,847
  • 2
  • 12
  • 10
93
votes
15 answers

Converting a String to a List of Words?

I'm trying to convert a string to a list of words using python. I want to take something like the following: string = 'This is a string, with words!' Then convert to something like this : list = ['This', 'is', 'a', 'string', 'with',…
rectangletangle
  • 50,393
  • 94
  • 205
  • 275
84
votes
12 answers

Android Word-Wrap EditText text

I have been trying to get my EditText box to word wrap, but can't seem to do it. I have dealt with much more complicated issues while developing Android applications, and this seems like it should be a straightforward process. However, the issue…
Bryan
  • 3,629
  • 2
  • 28
  • 27
69
votes
10 answers

Python: Cut off the last word of a sentence?

What's the best way to slice the last word from a block of text? I can think of Split it to a list (by spaces) and removing the last item, then reconcatenating the list. Use a regular expression to replace the last word. I'm currently taking…
qwerty
  • 717
  • 1
  • 5
  • 5
52
votes
12 answers

Replace a whole line where a particular word is found in a text file

How can I replace a particular line of text in file using php? I don't know the line number. I want to replace a line containing a particular word.
kishore
  • 1,017
  • 3
  • 12
  • 21
51
votes
3 answers

Haskell file reading

I have just recently started learning Haskell and I am having a lot of trouble trying to figure out how file reading works. For example, I have a text file "test.txt" containing lines with numbers: 32 4 2 30 300 5 I want to read each line and then…
DustBunny
  • 860
  • 2
  • 11
  • 25
34
votes
6 answers

How to break up document by sentences with Spacy

How can I break a document (e.g., paragraph, book, etc) into sentences. For example, "The dog ran. The cat jumped" into ["The dog ran", "The cat jumped"] with spacy?
Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87
29
votes
4 answers

Is there any good open-source or freely available Chinese segmentation algorithm available?

As phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is…
Sebastian
  • 6,293
  • 6
  • 34
  • 47
24
votes
6 answers

how to remove first word from a php string

I'd like to remove the first word from a string using PHP. Tried searching but couldn't find an answer that I could make sense of. eg: "White Tank Top" so it becomes "Tank Top" Thanks
Monkeyalan
  • 241
  • 1
  • 2
  • 3
21
votes
6 answers

Split a string to a string of valid words using Dynamic Programming

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem: You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all…
Pet
  • 211
  • 1
  • 2
  • 3
21
votes
10 answers

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

What is the right way to split a string into words ? (string doesn't contain any spaces or punctuation marks) For example: "stringintowords" -> "String Into Words" Could you please advise what algorithm should be used here ? ! Update: For those who…
Termos
  • 664
  • 1
  • 7
  • 31
20
votes
7 answers

Python extract sentence containing word

I am trying to extract all the sentence containing a specified word from a text. txt="I like to eat apple. Me too. Let's go buy some apples." txt = "." + txt re.findall(r"\."+".+"+"apple"+".+"+"\.", txt) but it is returning me : [".I like to eat…
user2187202
  • 337
  • 1
  • 3
  • 11
19
votes
1 answer

How to break words into syllables in LaTeX correctly

I am writing my MSc with LaTeX and I have the problem that sometimes my words are divided in a wrong way. My language is spanish and I'm using babel package. How could I solve it? For example: propuestos appears prop-uestos (uestos in next line). It…
legami
  • 1,303
  • 6
  • 22
  • 31
19
votes
7 answers

fixing words with spaces using a dictionary look up in python?

I have extracted the list of sentences from a document. I am pre-processing this list of sentences to make it more sensible. I am faced with the following problem I have sentences such as "more recen t ly the develop ment, wh ich is a po ten t " I…
suzee
  • 563
  • 4
  • 25
1
2 3
13 14