My problem is simple. I have a pandas dataframe with 124957 different tweets (related to a center-topic). The problem is that each date has more than 1 tweet (around 300 per day).
My goal is to perform sentiment analysis on the tweets of each day. In order to solve this, I am trying to combine all tweets of the same day into one string (which corresponds to each date).
To achieve this, I have tried the following:
indx=0
get_tweet=""
for i in range(0,len(cdata)-1):
get_date=cdata.date.iloc[i]
next_date=cdata.date.iloc[i+1]
if(str(get_date)==str(next_date)):
get_tweet=get_tweet+cdata.text.iloc[i]+" "
if(str(get_date)!=str(next_date)):
cdata.loc[indx,'date'] = get_date
cdata.loc[indx,'text'] = get_tweet
indx=indx+1
get_tweet=" "
df.to_csv("/home/development-pc/Documents/BTC_Tweets_1Y.csv")
My problem is that only a small sample of the data is actually converted to my format of choice.
I do not know whether it is of importance, but the dataframe consists of three separate datasets that I combined into one using "pd.concat". After that, I sorted the newly created dataframe by date (ascending order) and reset the index as it was reversed (last input (2020-01-03) = 0 and first input (2019-01-01) = 124958).
Thanks in advance, Filippos