-2

I have a text like this:

text = "renoncent au développement. Au lieu de cela,elles s'attaquent à la jugulaire: investir dans un bien immobilier en exploitation qui génère des bénéfices.Avant d'investir, donnée s'est comportée en tant que grand promoteur. Pour déterminer si un projet 'offre potentiel' de profit réaliste,  pesez les antécédents de la et l'équilibre risque récompense potentiel de tout nouveau projet majeur. Souvent, qui cherche une approche intermédiaire formera un partenariat ou une coentreprise avec une entreprise qui est déjà sur le terrain et qui réalise des profits."

i want to have a list from this text that contains every word on the text.

chikabala
  • 653
  • 6
  • 24
  • What is your question? This isn't a discussion forum or tutorial. Please take the [tour] and take the time to read [ask] and the other links found on that page. – wwii May 24 '21 at 18:03
  • Why would you want to work with Pandas? You do not need it in this case. – Bas May 24 '21 at 18:11
  • `words = []` `[words.append(word) for word in text.replace(',','').split(" ")]` `words ` – claudius May 24 '21 at 18:26

4 Answers4

2

You can add it into set so that there wont be any duplicates and remove comma if not required :

words = set()
for word in text.split(" "):
    words.add(word.replace(',',''))
if ',' in words:
    words.remove(',')
Yeshwanth N
  • 570
  • 4
  • 15
1

You can strip ',' while adding the word to list. Also you can use OrderedDict Module to remove duplicates.

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."
words = []
from collections import OrderedDict
for word in text.split(" "):
   words.append(word.strip(",")) #=== Remove ',' from word
list1=list(OrderedDict.fromkeys(words)) #=== Remove duplicates
print(list1)
1

This is not the most efficient, but will work using lists.

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."

words = []

def get_unique_words(text):
    # converts all alphabetical characters to lower
    lower_text = text.lower()
    # splits string on space character 
    split_text = lower_text.split(' ')

    # empty list to populate unique words
    results_list = []
    # iterate over the list
    for word in split_text:
        # check to see if value is already in results lists
        if word not in results_list:
            # append the word if it is unique
            results_list.append(word)
    return results_list

results = get_unique_words(text)

print(results)

prints

['conscious', 'of', 'its', 'spiritual', 'and', 'moral', 'heritage,', 'the', 'union', 'is', 'founded', 'on', 'indivisible,', 'universal', 'values', 'human', 'dignity,', 'freedom,', 'equality', 'solidarity;', 'it', 'based', 'principles', 'democracy', 'rule', 'law.', 'places', 'individual', 'at', 'heart', 'activities,', 'by', 'establishing', 'citizenship', 'creating', 'an', 'area', 'security', 'justice.']
Joe Thor
  • 1,164
  • 1
  • 11
  • 19
1
list(set(text.split(" ")))

And this way the comma's are removed, but it gets a bit unreadable:

list(set(''.join(text.split(",")).split(" ")))
Bas
  • 454
  • 1
  • 6
  • 14