0

What I've been trying, but it returns ["NOTES1", "NOTES2", "NOTES3"] instead of the contents of the dataframe columns:

df_word_list = []
df_notes = df[["NOTES1" , "NOTES2", "NOTES3"]]
one_list = list(flatten(df_notes.values.tolist()))

for word in df_notes:
 df_word_list.append(word)
print(df_word_list)

Does this mean the dataframe isn't being read in correctly? Thanks

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 2
    If `df_notes` is a dataframe then iterating directly over it will go you the column names. – juanpa.arrivillaga Mar 25 '20 at 20:41
  • Please have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a [mcve] including sample input and your expected output. For iterating over a dataframe, you need `iterrows()` but I can't see a reason you'd need to iterate based on what you've posted – G. Anderson Mar 25 '20 at 20:44

1 Answers1

1

It looks like you are trying two different ways of getting words into the words of a dataframe into a single list?

import pandas as pd

data = [{"NOTES1": "annual report",
 "NOTES2": "all of these",
 "NOTES3": "we urge and"},
{"NOTES1": "business 10-k",
 "NOTES2": "state are",
 "NOTES3": "we urge you to"},
{"NOTES1": "business annual ",
 "NOTES2": "all of these",
 "NOTES3": "we various"}]
df = pd.DataFrame(data)

# should probably call this word_list
df_word_list = []

# I'm assuming your data looks like above
df_notes = df[["NOTES1" , "NOTES2", "NOTES3"]]

where are you getting flatten from?

# one_list = list(flatten(df_notes.values.tolist()))

1) I think you are trying to flatten a list? Can do this using a list comprehension:

flat_list1 = [item for sublist in df_notes.values.tolist() for item in sublist]

print(flat_list1)
# ['annual report', 'all of these', 'we urge and', 'business 10-k', 'state are', 'we urge you to', 'business annual ', 'all of these', 'we various']

Or using two for loops:

flat_list2 = []
for sublist in df_notes.values.tolist():
    print(sublist)
    for item in sublist:
        print(item)
        flat_list2.append(item)


print(flat_list2)
# ['annual report', 'all of these', 'we urge and', 'business 10-k', 'state are', 'we urge you to', 'business annual ', 'all of these', 'we various']

2) I think you are trying to iterate through each row? Another way you could do it using itterows:

word_list = []
for row_num, row_series in df_notes.iterrows():
    print("Row Number:\t", row_num)
    row_list = row_series.tolist()
    print("Row Data:\t",row_list)
    word_list = row_list + word_list

print(word_list)
# ['annual report', 'all of these', 'we urge and', 'business 10-k', 'state are', 'we urge you to', 'business annual ', 'all of these', 'we various']