-2
df = pd.read_csv(filename.csv)
corpus = df.corpus

How can I combine series of text strings (from one column) into a list?

from column 'corpus': row 1: Hail Mary.
row 2: Hi Bob.
row 3: Hey Sue.

into [Hail Mary. Hi Bob. Hey Sue.]

Looking for a list with len(list)=1.

Joe
  • 12,057
  • 5
  • 39
  • 55
spacedustpi
  • 351
  • 5
  • 18

4 Answers4

2

If I understood you correctly:

df = pd.read_csv(your_file)
l = [' '.join(df['col'])]

Input:

          col
0  Hail Mary.
1     Hi Bob.
2    Hey Sue.

Output:

['Hail Mary. Hi Bob. Hey Sue.']
Joe
  • 12,057
  • 5
  • 39
  • 55
1
import pandas as pd

df = pd.read_csv('example.csv')

result = [' '.join([row for row in df['column_name']])]

Output of result:

['Hail Mary. Hi Bob. Hey Sue.']
chriscberks
  • 425
  • 2
  • 8
0

test.csv:

Hail Mary. Hi Bob. Hey Sue.

python:

import csv
data = []
with open('test.csv','rb') as csvfile:
    for row in csvfile:
        data.append(row.strip())
print data

output: ['Hail Mary.', 'Hi Bob.', 'Hey Sue.']

sur.la.route
  • 440
  • 3
  • 11
  • 1
    As per OP's comments, the combine should work on rows and not columns. `row` in your case will contain all the columns in the single row. Hopefully, you can edit your answer once he posts something otherwise it will be downvoted – mad_ Aug 23 '18 at 16:45
  • @Christopher, this sort of works, but I'd skip the first row because it brings in the column labels into the list. In addition, the list length comes out to 5, I was looking for length of 1. – spacedustpi Aug 23 '18 at 17:35
-1
    my_list = []

    for i in range(1,ws.max_row):
        if(len(my_list) == 0):
            my_list.append(ws.cell(row=i,column=1).value)   #assuming the value in 1st column
        else:
            my_list[0] += ws.cell(row=i,column=1).value

    for i in my_list:
        print(i)
Somesh
  • 82
  • 8
  • 1
    This isn't pandas syntax, it looks like you're using an Excel reader. I don't understand how this deals with the OP's problem – roganjosh Aug 23 '18 at 16:28
  • I'm working with millions of lines, so I would not use this option even if we worked the kinks out. For loops are notoriously slow with big data sets. – spacedustpi Aug 23 '18 at 17:37