0

I'm currently working on a project and I need to add specific rows whenever the tagged sentence ends. Whenever the 'N' column equals 1 it means that a new sentence started. I want to add two rows for each sentence: a row with 'Pos'= START at the beginning of the sentence, and a row with 'Pos'=End at the end of each row. This is what the DataFrame look like:

POSTAG = {
        'N': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9],
        'Name': ['ἐρᾷ','μὲν','ἁγνὸς','οὐρανὸς','τρῶσαι','χθόνα',',','ἔρως','δὲ','γαῖαν','λαμβάνει','γάμου','τυχεῖν','.','ὄμβρος','δ̓','ἀπ̓','εὐνάοντος','οὐρανοῦ','πεσὼν','ἔκυσε','γαῖαν','.','ἡ','δὲ','τίκτεται','βροτοῖς','μήλων','τε','βοσκὰς','καὶ','βίον','Δημήτριον','.','δενδρῶτις','ὥρα','δ̓','ἐκ','νοτίζοντος','γάμου','τέλειος','ἐστί','.'],
        'Pos': ['VERB','ADV','ADJ','NOUN','VERB','NOUN','PUNCT','NOUN','CCONJ','NOUN','VERB','NOUN','VERB','PUNCT','NOUN','ADV','ADP','ADJ','NOUN','VERB','VERB','NOUN','PUNCT','DET','ADV','VERB','NOUN','NOUN','ADV','NOUN','CCONJ','NOUN','ADJ','PUNCT','NOUN','NOUN','ADV','ADP','VERB','NOUN','ADJ','VERB','PUNCT']
        }

df = pd.DataFrame(POSTAG, columns = ['N', 'Name','Pos'])
print (df)

In this case I need a [Nan, Nan, START] tag at indexes 0 and 15. and a [Nan,Nan, END] tag at index 14. I need to make it for all my df. How could I do this?

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
Tef Don
  • 99
  • 8
  • please provide your sample dataframe as code, not a picture – gold_cy Apr 13 '21 at 13:59
  • Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)...[Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) ... [You should not post code as an image because:...](https://meta.stackoverflow.com/a/285557/2823755) – wwii Apr 13 '21 at 14:17
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Apr 13 '21 at 14:17
  • `I don't really know how I could do this.` - did you search SO? Did you spend time with the [Pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html)? Which part are you having trouble with? Welcome to SO. This isn't a discussion forum or tutorial. Please take the [tour] and take the time to read [mre] and [ask] and the other links found on that page. – wwii Apr 13 '21 at 14:19
  • [Why is “Can someone help me?” not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) – wwii Apr 13 '21 at 14:22
  • Sorry I added the code. – Tef Don Apr 13 '21 at 14:46

1 Answers1

0

Analyzing your dataframe, I just assume you want to insert START before value 1 in column N and insert END after the max continuous value in column N. If so, you could do following

First create two dummy dataframe start_df and end_df

start_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})
end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})

Then split the dataframe with continuous value in column N

mask = ~df['N'].diff().fillna(0).eq(1)

gb = df.groupby(mask.cumsum())
groups = [gb.get_group(x) for x in gb.groups]

Moreover, insert dummy dataframe before and after each group

res = []

for group in groups:
    res.append(start_df)
    res.append(group)
    res.append(end_df)

At last, create dataframe by concating dataframe in list

df_ = pd.concat(res).reset_index(drop=True)
# print(df_)

       N        Name      Pos
0    NaN         NaN  ->START
1    1.0         ἐρᾷ     VERB
2    2.0         μὲν      ADV
3    3.0       ἁγνὸς      ADJ
4    4.0     οὐρανὸς     NOUN
5    5.0      τρῶσαι     VERB
6    6.0       χθόνα     NOUN
7    7.0           ,    PUNCT
8    8.0        ἔρως     NOUN
9    9.0          δὲ    CCONJ
10  10.0       γαῖαν     NOUN
11  11.0    λαμβάνει     VERB
12  12.0       γάμου     NOUN
13  13.0      τυχεῖν     VERB
14  14.0           .    PUNCT
15   NaN         NaN    END<-
16   NaN         NaN  ->START
17   1.0      ὄμβρος     NOUN
18   2.0          δ̓      ADV
19   3.0         ἀπ̓      ADP
20   4.0   εὐνάοντος      ADJ
21   5.0     οὐρανοῦ     NOUN
22   6.0       πεσὼν     VERB
23   7.0       ἔκυσε     VERB
24   8.0       γαῖαν     NOUN
25   9.0           .    PUNCT
26   NaN         NaN    END<-
27   NaN         NaN  ->START
28   1.0           ἡ      DET
29   2.0          δὲ      ADV
30   3.0    τίκτεται     VERB
31   4.0     βροτοῖς     NOUN
32   5.0       μήλων     NOUN
33   6.0          τε      ADV
34   7.0      βοσκὰς     NOUN
35   8.0         καὶ    CCONJ
36   9.0        βίον     NOUN
37  10.0   Δημήτριον      ADJ
38  11.0           .    PUNCT
39   NaN         NaN    END<-
40   NaN         NaN  ->START
41   1.0   δενδρῶτις     NOUN
42   2.0         ὥρα     NOUN
43   3.0          δ̓      ADV
44   4.0          ἐκ      ADP
45   5.0  νοτίζοντος     VERB
46   6.0       γάμου     NOUN
47   7.0     τέλειος      ADJ
48   8.0        ἐστί     VERB
49   9.0           .    PUNCT
50   NaN         NaN    END<-
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52