5

I have the following DataFrame:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 "future" to_check
5 5 to to_check
6 6 be to_check

I want to iterate trough each row to find quotes at word initial and final positions and create a DataFrame like this:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 " quotes
5 4 future word
6 4 " quotes
7 5 to to_check
8 6 be to_check

I can strip quotes and split the word into three pieces but I got the this DataFrame, it overwrites the last two rows:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 " quotes
5 4 future word
6 4 " quotes

I tried df.loc[index], df.iloc[index], df.at[index] but none of them helped me to extend the number of rows in the DataFrame.

Is it possible to add new rows at specific index without overwriting last two rows?

3 Answers3

6

In your case you can split then explode

out = df.assign(word = df.word.str.split(r'(\")')).explode('word').\
           loc[lambda x : x['word']!='']
   no    word    status
0   0     one  to_check
1   1     two  to_check
2   2      :)  emoticon
3   3     dr.  to_check
4   4       "  to_check
4   4  future  to_check
4   4       "  to_check
5   5      to  to_check
6   6      be  to_check

For change the status

out['status'] = np.where(out['word'].eq('"'), 'quotes',out['status'])
BENY
  • 317,841
  • 20
  • 164
  • 234
0

This is not very efficient however I think it is very readable, maybe you could use it if you don't care too much of efficiency.

no_lst = list()
word_lst = list()
status_lst = list()

def check_quote(w):
    if w.startswith('"') and w.endswith('"'):
        return True
    else:
        return False

for i, row in enumerate(df.itertuples()):
    word = getattr(row, "word")
    status = getattr(row, "status")
    
    if check_quote(word):
        no_lst += [i,i,i]
        stripped_w = word.strip('"')
        
        word_lst.append('"')
        word_lst.append(stripped_w)
        word_lst.append('"')
        
        status_lst += ["quotes", "word", "quotes"]
        continue
        
    no_lst.append(i)
    word_lst.append(word)
    status_lst.append(status)

new_df = pd.DataFrame({"no":no_lst,
                   "word":word_lst,
                   "status":status_lst})
haneulkim
  • 4,406
  • 9
  • 38
  • 80
0

you try creating this function.

def InsertRow(df, idx):
    df1 = df[0:idx]
    df2 = df[idx:]

    arr_empty = np.array([idx-1, '\"', 'quotes'])

    df1.iloc[-1]=arr_empty
    df = pd.concat([df1, df2])

    return df
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-ask). – Community Sep 14 '21 at 02:23