How to extract features from text data set?

Question

I try to tokenize the text file that i get from my zip folder but i am facing this error

My Error

TypeError: expected string or bytes-like object

Does this answer your question? [Dataframe encoding](https://stackoverflow.com/questions/30156012/dataframe-encoding) — shimo, Jan 12 '20 at 11:51

zamir · Answer 1 · 2020-01-12T11:58:56.320

0

Add r to yourC:\Users\killer\Desktop\User1.txt so the backslash become \\ instead of \ because \U in Users is being interpreted as a start of an unicode

pd.read_csv(r"C:\Users\killer\Desktop\User1.txt")

Or you can escape it manually or just change \ to /

edited Jan 12 '20 at 11:58

answered Jan 12 '20 at 11:53

zamir

2,144
1
11
23

score 0 · Answer 2 · edited Jan 12 '20 at 13:27

What you are doing is right but there are some characters that can't be read (not Unicode characters). This is because the file path you have given as \U (from \User) will by default be recognized as an escape sequence character and is unknown. For a file path to be recognized as one, you have to:

A) write it with \\, for eg. "C:\\Users\\killer\\..."

B) write it with / , for eg "C:/Users/killer/..."

C) use r in front, for eg. r"C:\Users\killer\" to use it as raw text, ie, everything is text and no escape sequences, etc.

score 0 · Answer 3 · edited Jan 12 '20 at 12:14

0

Try the following code:

Data = pd.read_csv("C:\Users\killer\Desktop\User1.txt", sep=", ")

Just add => , sep=", " at the end of the file you want to read.

Note that in quotation marks add what separates the text. In most cases, the text is separated by a comma "," but you can check the file by opening it with your default text reader to see what separates it.

edited Jan 12 '20 at 12:14

Tomer

1,521
1
15
26

answered Jan 12 '20 at 12:04

Gerald Hoxha

29
1
1
6

How to extract features from text data set?

3 Answers3