So after I converted the UTC timezone in the Time column of my dataframe and saved it to a new csv file, I decided to draw a time plot of frequency of tweets. My time plot was initially working when timezone was UTC but after being converted to Eastern, it gives me the error below. How should I fix it?
import pandas as pd
import matplotlib.pyplot as plt
time_interval = pd.offsets.Second(10)
fig, ax = plt.subplots(figsize=(6, 3.5))
ax = (
pd.read_csv('converted_timezone_tweets.csv', parse_dates=['Time'])
.resample(time_interval, on='Time')['ID']
.count()
.plot.line(ax=ax)
)
plt.show()
And the error is:
/scratch/sjn/anaconda/bin/python /scratch2/debate_tweets/temporal_analysis.py
Traceback (most recent call last):
File "/scratch2/debate_tweets/temporal_analysis.py", line 18, in <module>
pd.read_csv('converted_timezone_tweets.csv', parse_dates=['Time'])
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1005, in read
ret = self._engine.read(nrows)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1748, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
Process finished with exit code 1
converted_timezone_tweets.csv look like this:
,Candidate,ID,Time,Username,Tweet
0,Clinton,788948653016842240,2016-10-19 23:43:11-04:00,Tamayo_castle,Hillary Clinton dresses as Christian Bale at the debate via /r/pics
1,Clinton,788948666501464064,2016-10-19 23:43:14-04:00,ThinkCenter1968,"It's like I told my kids, a reason U don't want 2 vote 4 Hillary is U want the inheritance I'm leaving U, Right? They changed their minds!"
2,Clinton,788948673594097664,2016-10-19 23:43:16-04:00,21stCenRevolt,When hearing about Saudi Arabia murdering people for being gay. Hillary laughed with glee. She disgusting and disgraceful. #debatenight
3,Both,788948662881751040,2016-10-19 23:43:13-04:00,mikeywan,MEGYN IS A PAID HILLARY WHORE #TrumpPence2016 #TrumpTrain
4,Both,788948675313696769,2016-10-19 23:43:16-04:00,erwoti,Can't wait to hear @realDonaldTrump call that Nasty Woman (Hillary Clinton) - Madam President #debatenight #ChrisWallace
5,Clinton,788948671756955650,2016-10-19 23:43:15-04:00,isaac_urner,"The Clinton campaign already has redirecting to their site. That's what a real campaign looks like.
#badhombres2016"
Same code works for valid_tweets.csv and creates a plot like below:
valid_tweets.csv lines look like:
Candidate,ID,Time,Username,Tweet
Clinton,788948653016842240,2016-10-20 03:43:11+00:00,Tamayo_castle,Hillary Clinton dresses as Christian Bale at the debate via /r/pics
Clinton,788948666501464064,2016-10-20 03:43:14+00:00,ThinkCenter1968,"It's like I told my kids, a reason U don't want 2 vote 4 Hillary is U want the inheritance I'm leaving U, Right? They changed their minds!"
Clinton,788948673594097664,2016-10-20 03:43:16+00:00,21stCenRevolt,When hearing about Saudi Arabia murdering people for being gay. Hillary laughed with glee. She disgusting and disgraceful. #debatenight
Both,788948662881751040,2016-10-20 03:43:13+00:00,mikeywan,MEGYN IS A PAID HILLARY WHORE #TrumpPence2016 #TrumpTrain
Both,788948675313696769,2016-10-20 03:43:16+00:00,erwoti,Can't wait to hear @realDonaldTrump call that Nasty Woman (Hillary Clinton) - Madam President #debatenight #ChrisWallace
Clinton,788948671756955650,2016-10-20 03:43:15+00:00,isaac_urner,"The Clinton campaign already has redirecting to their site. That's what a real campaign looks like.
#badhombres2016"
Update: in my first file I have:
import pandas as pd
import matplotlib.pyplot as plt
#2016-10-20 03:43:11+00:00
tweets_df = pd.read_csv('valid_tweets.csv')
tweets_df['Time'] = pd.Index(pd.to_datetime(tweets_df['Time'], utc=True)).tz_localize('UTC').tz_convert('US/Eastern')
tweets_df.to_csv('converted_timezone_tweets.csv', index=False)
In my second file I have:
import pandas as pd
import matplotlib.pyplot as plt
time_interval = pd.offsets.Second(10)
fig, ax = plt.subplots(figsize=(6, 3.5))
ax = (
pd.read_csv('converted_timezone_tweets.csv', engine='python', parse_dates=['Time'])
.resample(time_interval, on='Time')['ID']
.count()
.plot.line(ax=ax)
)
plt.show()
After using the engine='python' as in one of the answers, I get this error:
/scratch/sjn/anaconda/bin/python /scratch2/debate_tweets/temporal_analysis.py
Traceback (most recent call last):
File "/scratch2/debate_tweets/temporal_analysis.py", line 11, in <module>
.resample(time_interval, on='Time')['ID']
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 4729, in resample
base=base, key=on, level=level)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/resample.py", line 969, in resample
return tg._get_resampler(obj, kind=kind)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/resample.py", line 1091, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Process finished with exit code 1
I did a vimdiff of the first 5 lines of each csv and this is what I get: