0

I asked this question yesterday but was unclear about a couple things so I'm reposting it here. Basically I have a dataframe of 13 columns and over 500 rows and I'm trying to add a header every x number of rows.

I'm a beginner so I've tried .concat and .append but I'm not sure if I really am doing it right

I have the variable header = ['Rk', 'Player', 'Age',...]

In: print(final.head())

out:
   index            Player Age   Tm Pos  GP   G   A    P +/- PPP    TOI
0      0   Nikita Kucherov  25  TBL  RW  82  41  87  128  24  41  19:58
1      4     Brad Marchand  30  BOS  LW  79  36  64  100  15  33  19:37
2      5     Sidney Crosby  31  PIT   C  79  35  65  100  18  20  21:00
3      6  Nathan MacKinnon  23  COL   C  82  41  58   99  20  31  22:05
4      7   Johnny Gaudreau  25  CGY  LW  82  36  63   99  18  29  20:04

I want to print the header every 48 rows, if I wanted to print it every 2 rows it would look like this:

In: print(final.head())

out:
   index            Player Age   Tm Pos  GP   G   A    P +/- PPP    TOI
0      0   Nikita Kucherov  25  TBL  RW  82  41  87  128  24  41  19:58
1      4     Brad Marchand  30  BOS  LW  79  36  64  100  15  33  19:37
                    Player Age   Tm  Pos GP   G   A    P  +/- PPP   TOI
2      5     Sidney Crosby  31  PIT   C  79  35  65  100  18  20  21:00
3      6  Nathan MacKinnon  23  COL   C  82  41  58   99  20  31  22:05
                    Player Age   Tm  Pos GP   G   A    P  +/- PPP   TOI
4      7   Johnny Gaudreau  25  CGY  LW  82  36  63   99  18  29  20:04

Note I don't really care about the what the value is for the index column for the header row when I insert multiple times, I'm pretty lenient for that part.

JackB
  • 31
  • 6
  • Hmmm, what is reason for it? Because only possible add header like rows, so get numeric values mixed with strings. So next processing of numeric data is not possible. – jezrael May 18 '19 at 15:57
  • Is this just for display purposes? As jezrael mentioned, mixing data types will not make it possible to do certain operations down the line. – busybear May 18 '19 at 16:13

1 Answers1

1

It is possible, but if need working later with data, not recommended, because if mixing numeric values mixed with strings some function should failed:

N = 2
#N = 48 with real data
#get index of added values, omit first value
idx = df.index[::N][1:]
#repeat columns to DataFrame
arr = np.broadcast_to(df.columns, (len(idx),len(df.columns)))
df1 = pd.DataFrame(arr, index=idx, columns=df.columns)

#append original and sorting by index
df = df1.append(df).sort_index().reset_index(drop=True)
print (df)
   index            Player  Age   Tm  Pos  GP   G   A    P  +/-  PPP    TOI
0      0   Nikita Kucherov   25  TBL   RW  82  41  87  128   24   41  19:58
1      4     Brad Marchand   30  BOS   LW  79  36  64  100   15   33  19:37
2  index            Player  Age   Tm  Pos  GP   G   A    P  +/-  PPP    TOI
3      5     Sidney Crosby   31  PIT    C  79  35  65  100   18   20  21:00
4      6  Nathan MacKinnon   23  COL    C  82  41  58   99   20   31  22:05
5  index            Player  Age   Tm  Pos  GP   G   A    P  +/-  PPP    TOI
6      7   Johnny Gaudreau   25  CGY   LW  82  36  63   99   18   29  20:04

EDIT For write each splitted DataFrame to separate sheet in one excel file use:

N = 2
#N = 48 with real data
with pd.ExcelWriter('file.xlsx') as writer:
    for i, df1 in enumerate(np.split(df, range(N, len(df), N))):
        df1.to_excel(writer, sheet_name=f'Sheet{i}', index=False)

EDIT1: For write all DataFrame to same sheetname:

#https://stackoverflow.com/a/33004253  + added index=False to df.to_excel
def multiple_dfs(df_list, sheets, file_name, spaces):
    writer = pd.ExcelWriter(file_name,engine='xlsxwriter')   
    row = 0
    for dataframe in df_list:
        dataframe.to_excel(writer,sheet_name=sheets,startrow=row ,startcol=0, index=False)   
        row = row + len(dataframe.index) + spaces + 1
    writer.save()

N = 2
#N = 48 with real data
dfs = np.split(df, range(N, len(df), N))
multiple_dfs(dfs, 'Steetname1', 'file.xlsx', 1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • That was pretty close the only thing is my n value for this is going to be 48 (Or at least I think that's what it should be). When I am exporting this dataframe to excel, excel can fit 48 rows per page when printed. I want it so that this row will be the first row on every page. – JackB May 18 '19 at 18:03
  • @JackB So you need write each header to separate sheet? And for each excel sheet are 48 rows with header? – jezrael May 19 '19 at 06:53
  • Yes pretty much, I want this header to be at the top of each page when I print the dataframe in Excel – JackB May 20 '19 at 15:03
  • @JackB - Edited answer. – jezrael May 21 '19 at 06:24
  • That's closer except I want it all to be on the same sheet – JackB May 22 '19 at 16:44
  • I think the problem is that when I do that, every time I add that header, I actually need the next one to be one less row away. So the first one needs to be 48 rows away, then the next only 47 then 46 etc... Also I don't want to have the blank new line before the actual header. Thanks again! – JackB May 28 '19 at 17:28
  • @JackB - Sorry, but if need change number of rows per each group, the best is post new question. If need no empty row betwen only change `1` to `0` by `multiple_dfs(dfs, 'Steetname1', 'file.xlsx', 0)` – jezrael May 30 '19 at 11:51