I have a dataframe that has these columns df['Page', 'Word', 'LineNum'].
df =
Idx Page Word LineNum
0 1 Hello 1
1 1 This 1
2 1 is 2
4 1 an 2
5 2 example 1
6 2 of 1
7 2 words 1
8 2 across 2
9 2 multiple 2
10 3 pages 1
11 3 in 1
12 3 the 1
13 4 document 1
14 4 which 1
15 4 has 1
16 4 split 1
This dataframe has been extracted from a csv file, and contains details about the document.
As you can imagine, several words appear in the same line (have the same value in LineNum), and a single page has several such lines.
This is what I want to do:
for( all the pages in the dataframe)
if( LineNum is the same )
df['AllWordsInLine'] = add all the words in the df['Word'] column.
Desired output
- LineDF['FullLine'] =
Idx FullLine
0 Hello This
1 is an
2 example of words
3 across multiple
4 pages in the
5 document which has split
I am just about 2 weeks into pandas, and I would much appreciate an expert's response. thank you, Venkat