I am trying to read a text file that has Instagram public posted images and their meta-data. Each line has one complete post along with all its meta-data. Some part of the image post is written in Arabic. When I am using Python to read the file, but the Arabic text does not show up after printing the line. Arabic text appear as etc. \xd9\x8a\xd8
This is the code snipped I am using to read from the .txt file
test_file = codecs.open('instagram_info.txt', mode='r', encoding='utf-8')
print ("reading images URLs file")
counter = 0
for line in test_file:
print("Line: ", line.encode("utf-8"))
counter += 1
print(counter)
if counter == 50:
break
test_file.close()
This is a line example from the text file
100158441 25.256887893 51.507485363 Centerpoint 4f09c7a6e4b090ef234993e3 http://scontent.cdninstagram.com/hphotos-xpa1/outbound-distilleryimage9/t0.0-17/OBPTH/9ecde7ecac7811e3b87a12bcaa646ac5_8.jpg sarrah80 25.256887893 51.507485363 2014-03-15 19:37:45 1394912265 16144 ولا راضي يوقف يم الارنوب عشان اصوره dody_nasser said "هههه اكيد خايف الجبان " nassersahim said "@sarrah80 يبغي يملغ عليكم" sarrah80 said "@dody_nasser بطل ولدي بس خبرج المود ومايسوي" sarrah80 said "@nassersahim انت شفت الأرنب شلون يطالعه ذبحني من الضحك " arwa9009 said "حياتي" fatimaaljasssim said "حياتتتتتتتنتتي عليهم فديتهم" 6 non_al3yooon,mun.mun_almalki,__manoor__,monaalalii 46
Also, the current code adds "b'" as a prefix for every line being read, Any idea why is this happening ?