0

I have a relatively simple question but I am novice in Python so I need help.

I want to iterate over a column in Python, where all values are sentences , like 'Friends+CCas+good result','just want everything to go smooth. serious','a mixture of both academic and non-academic', etc ...

Serie i want to loop
First 'Friends+CCas+good result'
Second 'just want everything to go smooth. serious'

My goal is to add all the string in the column into a single one in order to count the total number of occurrences of each word separately for the entire column. I found this method for two string :

string = 'Hello ' 
string += 'World'

print(string) => 'Hello World' then I can string.split() but I tried list comprehension and loop without getting the good result I wanted for my entire column, in order to get something like this:

'Friends+CCas+good result just want everything to go smooth. serious a mixture of both academic and non-academic' with a space between all strings and then split the entire thing in order to get the total frequencies of each word

I hope I am clear enough.

Thank you in advance

azro
  • 53,056
  • 7
  • 34
  • 70
  • It is not quite clear what the end result you expect. Can you please update the question and add the expected result for the sample data you have presented in the question? – ThePyGuy Aug 28 '21 at 08:09

2 Answers2

0

Assuming by "column" you mean python list: You can iterate over the list and add each string with a space before it like this:

full_str = ""
for sentence in list_name:
    full_str += " " + sentence
Lecdi
  • 2,189
  • 2
  • 6
  • 20
0

I'd advise to use regex to extract the words:

import re

data = ['Friends+CCas+good result', 'just want everything to go smooth. serious']
re.findall(r'\b\w+\b', ' '.join(data))

Or use pandas:

import pandas as pd 

data = ['Friends+CCas+good result', 'just want everything to go smooth. serious']
df= pd.DataFrame(data, columns=['strings'])
df['strings'] = df['strings'].str.lower().str.findall(r'\b\w+\b')
df.explode('strings').stack().value_counts()
0
result 1
good 1
serious 1
friends 1
go 1
want 1
to 1
smooth 1
ccas 1
just 1
everything 1
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26