0

I have a Dataframe with text in every cell. I want to iterate over the dataframe and the single characters of its cells and fill a list with either 0 for having a whitespace or 1 for having a character. I tried itertuples, iterrows and iteritems, but for all I can't access every single character of a string.

crispr = pd.DataFrame({'Name': ['Bob', 'Jane', 'Alice'], 
                       'Issue': ['Handling data', 'Could not read sample', 'No clue'],
                       'Comment': ['Need to revise data', 'sample preparation', 'could not find out where problem occurs']})

what I tried is:

dflist = []
countchar= 0
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[countchar].isspace()
        countchar+=1
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

I tried to figure out if it would work with itertuples or iterrows():

for i in crispr.itertuples():
    for j in i:
        for b in j:
            print(b)

It occurs the following error:

 TypeError: 'int' object is not iterable  

Expected output is a list containing 1 for a character and 0 for whitespace:

dflist = [[1,1,1], [1,1,1,1], [1,1,1,1,1]],[[1,1,1,1,1,1,1,0,1,1,1,1], ...]]
hux0
  • 207
  • 1
  • 4
  • 17

1 Answers1

1

Your posted code (before your last edit) was faulty, lots of unknown stuff in it that gives different errors from what you posted. I fixed your code to be:

dflist = []                    # added this
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[x].isspace()  # changed countchar to x
        # countchar+=1         # removed this
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

for i in crispr.itertuples():
    for j in i:
        for b in j:  # this procudes your error
            print(b)

If you inspect the first item of j you'll see its the value 0 - hence the error. You can not iterate 0.

Solution:

import pandas as pd

crispr = pd.DataFrame({
    'Name': ['Bob', 'Jane', 'Alice'],
    'Issue': ['Handling data', 'Could not read sample', 'No clue'],
    'Comment': ['Need to revise data', 'sample preparation', 
                'could not find out where problem occurs']})

print(crispr)
outer_list = []
for i,j in crispr.iteritems():
    dflist = []
    for word in j:
        wordlist = [] 
        for char in word:
            if char.isspace():
                wordlist.append(0)
            else:
                wordlist.append(1)
        dflist.append(wordlist)
    outer_list.append(dflist)

print(outer_list)

Output (added comments for clarity):

                                   Comment                  Issue   Name
0                      Need to revise data          Handling data    Bob
1                       sample preparation  Could not read sample   Jane
2  could not find out where problem occurs                No clue  Alice

# Comment
[[[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 
   1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]], 
 # Issue
 [[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1], 
  [1, 1, 0, 1, 1, 1, 1]],
 # Name 
 [[1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1]]]

should do what you want.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69