I'm working on a personal project and came across something that I didn't understand the outcome of. My aim was to split my list-type column into individual columns (each column having one element of the list) and I was able to do that successfully. However, one way of implementing it doesn't give the result I want, despite the code being the exact(??) same. I have two files football_tweets.py
and analyseFiles.py
this is my code for football_tweets.py
:
class TweetAnalyser():
#data = []
def createDataFrame(self, tweets):
##this fucntion creats the dataframe
df = pd.DataFrame(data=[tweet.full_text for tweet in tweets], columns=['tweets'])
df['id'] = np.array([tweet.id for tweet in tweets])
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
df['created_at'] = np.array([tweet.created_at for tweet in tweets])
df['emoji_code'] = np.array([tweet_analyser.check_emoji(tweet) for tweet in df['tweets']])
df['tweet_sentiment'] = np.array([tweet_analyser.analyse_sentiment(tweet) for tweet in df['tweets']])
return df
def check_emoji(self, tweet):
#function to convert emoji symbol/char into its unicode
##translate and check emoji here
emoji_list = []
data = regex.findall(r'\X', tweet)
senti_df = pd.read_csv('new_sentiment_data.csv')
for word in data:
if any(char in emoji.UNICODE_EMOJI for char in word):
#translate word to unicdoe code
##append unicode code to list
try:
uni_code = f'U+{ord(word):X}'
emoji_list.append(uni_code)
except TypeError:
pass
return emoji_list
(there are more functions, but these are the only ones necessary for the question)
I ran the code as follows:
if __name__ == '__main__':
twitter_client = TwitterClient('Arsenal')
tweet_analyser = TweetAnalyser()
api = twitter_client.get_twitter_client_api()
tweets = twitter_client.get_user_tweets(1212442388981002240, 1236413003127566337)
df = tweet_analyser.createDataFrame(tweets)
df.to_csv('tweet_file.csv')
new_df = pd.DataFrame(df.emoji_code.values.tolist()).add_prefix('emoji_')
print(new_df)
and I received the EXPECTED result:
emoji_0 emoji_1 emoji_2 emoji_3 emoji_4 emoji_5
0 U+1F60D None None None None None
1 U+1F3B6 U+1F4A7 None None None None
2 U+1F4AC U+1F454 U+1F447 None None None
3 U+1F3C6 None None None None None
4 U+1F602 U+1F454 None None None None
.. ... ... ... ... ... ...
373 U+270A None None None None None
I then tried this same solution in a separate file, analyseFiles.py
as follows and received this result after printing:
def analyse_emoji():
df = pd.read_csv('tweet_file.csv')
senti_df = pd.read_csv('new_sentiment_data.csv')
new_df = pd.DataFrame(df.emoji_code.values.tolist()).add_prefix('emoji_')
print(new_df)
emoji_0
0 ['U+1F60D']
1 ['U+1F3B6', 'U+1F4A7']
2 ['U+1F4AC', 'U+1F454', 'U+1F447']
3 ['U+1F3C6']
4 ['U+1F602', 'U+1F454']
.. ...
373 ['U+270A']
Why did the second implementation not give me the expected result despite the code being the same? Is there a concept that I need to learn/brush up on? tweet_file.csv
is where I have stored the dataframe and I'm calling it in the second solution rather than the first, where I create it. Is that where the problem occurs?
***edit ***
print(df)
from football_tweets.py
:
tweets ... tweet_sentiment
0 The crucial moment.\n\n @LacazetteAlex\n\n#AR... ... 0.000000
1 "...so fresh, so clean..."\n\n#ARSWHU https... ... 0.333333
2 "I'm really happy with the result because bi... ... 0.600000
3 Your man of the match today...\n\n @Bernd_Len... ... 0.000000
4 Just another day on the touchline \n\n @m8ar... ... 0.000000
.. ... ... ...
369 Let's keep this going! ✊\n\n#ARSMUN ... 0.000000
print(df)
from analyseFiles.py
:
0 0 ... 0.000000
1 1 ... 0.333333
2 2 ... 0.600000
3 3 ... 0.000000
4 4 ... 0.000000
.. ... ... ...
373 373 ... 0.000000
This may be where the problem occurs.