-6

I have a cvs file with one column and 300,000 individual text lines, which I would like to convert into a list of list. So that I get a list of 300,000 lists, with every sentence readable as a string.

When I open the csv as a DataFrame and convert it into a series, every sentence is split into letter.

sentence = pd.read_csv("myfile.csv", encoding='utf-8') 
sentence = pd.Series([sentence])
sentence = sentence.tolist()

This gives:

[[('W', 'h', 'a', 't', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 's', 't', 'e', 'p'

Instead, what I would like is for example when I would print(sentence), it would show:

[['What is the step by step approach for building a house?'],['The
first step is securing an adequate plot.'] etc....]

Is there a simple way to do this?

twhale
  • 725
  • 2
  • 9
  • 25

2 Answers2

0

You can probably just skip read_csv and read the file as just a file. See: How do I read a file line-by-line into a list?

In your case, you can throw out the headers.

Brian Kung
  • 3,957
  • 4
  • 21
  • 30
0

Since it's just one column, why not just open it as a regular text file?

df = pd.DataFrame([line for line in open('myfile.csv', 'r')])
r.ook
  • 13,466
  • 2
  • 22
  • 39