0

I am a newbie and I have a csv file that contains account name of reddit, subreddit, time and message.

I read it with:

train_data = pd.read_csv("addres/train_data.csv", encoding="utf8")

if I write: train_data.head() I see

enter image description here

Do you know a way in which I can create an array with [author, body] ?

To begin I have tried to create two array (author and messages) in this way:

train=open("addres/train_data.csv")
train.readline()
author=[]
message=[]
for line in train:
    autore,categoria,ora, messaggio=line.split(",")
    author.append(autore)
    message.append(messaggio)

But messages contains "," so it doesn't work properly.

Thank you and sorry for the silly question.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
MementoMori
  • 283
  • 4
  • 10

1 Answers1

1
df_tmp = train_data[['author', 'body']] # allows you to select subset by column name
content_array = [list(x) for x in df_tmp.values] # a list of lists ([ith_author, ith_body])

Keep in mind that df_tmp is only a view of train_data, you might want to make a copy depending on what you intend to do with it. If you need the data to be immutable, you can use tuples instead.

darthbhyrava
  • 505
  • 4
  • 14
  • Hi, thank you. One more question. Now I'm interested only on the messages and i have done as you have said and it is ok. I obtain a list: [["message1"], ["message2 "], etc ] . I want instead a thing like this: ["message1", "message2" etc...] how can I do? – MementoMori Nov 29 '19 at 19:24
  • Glad I could help. If the answer solves your problem, please *accept* it. As for how to flatten a list of lists into a list, please look at [this](https://stackoverflow.com/a/952952/5070837) answer. – darthbhyrava Nov 29 '19 at 19:34