Printing values with Pandas

Question

First of all, I am totally new on Python, so, maybe is something super simple I am not doing correctly.

I am reading a multiple worksheet xlsx file and sending each of them to separated dataframe. (at least, I think I am doing it).

xl = pd.ExcelFile("results/report.xlsx")
d = {} # your dict.
for sheet in xl.sheet_names:
    d[f'{sheet}']= pd.read_excel(xl,sheet_name=sheet)



lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg3_results'].columns[lista_colunas].values

This is the result.

>>> print(d)
{'Sheet': Empty DataFrame
Columns: []
Index: [], 'report': Empty DataFrame
Columns: []
Index: [], 'Seg10_results':    ID      Hora de início   Hora de conclusão      Email  ...  Humanas  Exatas  Linguagens Biológicas
0   1 2021-04-28 13:38:51 2021-04-28 16:25:59  anonymous  ...       38      50          38         38 
1   2 2021-04-28 17:02:11 2021-04-28 17:57:48  anonymous  ...       25       0          25         38 

[2 rows x 105 columns], 'Seg1_results':     ID      Hora de início   Hora de conclusão  ... Exatas Linguagens  Biológicas
0    1 2020-05-26 08:30:00 2020-05-26 09:15:00  ...     25         29          38
1    2 2020-05-26 08:31:12 2020-05-26 09:21:38  ...     38         33          38
2    3 2020-05-26 08:27:40 2020-05-26 09:21:38  ...     50         29          38

Then, I am trying to print just some of columns of each df (trying it manually)

lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg10_results'].columns[lista_colunas].values

But I am getting only this:

>>> d['Seg10_results'].columns[lista_colunas].values
array(['NOME COMPLETO', 'QUAL A SUA OFICINA DE APRENDIZAGEM?', 'Humanas',
       'Exatas', 'Linguagens', 'Biológicas'], dtype=object)

Any value is being shown

If I call only d['Seg10_results'][lista_colunas], I get this:

>>> d['Seg10_results'][lista_colunas]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3461, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis)
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([7, 10, 101, 102, 103, 104], dtype='int64')] are in the [columns]"

What am I doing wrongly?

In time, this is part of a major work. All I am trying to do it, filter some columns of all worksheets, and save them into a new xlsx file (again, separated by worksheets, but filtered)

Adding my solution to exporting for single file with multiple sheet

I know this is far from a beautiful code, but it is working at the moment.

dados = pd.read_excel("results/report.xlsx", sheet_name=None)
df = pd.concat(dados[frame] for frame in dados.keys())

lista_colunas = [7, 10, 101, 102, 103, 104]
filtro = df.columns[lista_colunas]
final_df = df[filtro]

grouped_df = final_df.groupby(final_df.columns[1])
salas = grouped_df.groups.keys()

writer = pd.ExcelWriter('results/resultado.xlsx', engine='xlsxwriter')

for sala in salas: 
        splitdf = grouped_df.get_group(sala) 
        splitdf.to_excel(writer, sheet_name=str(sala)) 
writer.save()

Marvin · Answer 1 · 2021-07-17T11:21:11.950

1

d['Seg10_results'][lista_colunas] is basically d['Seg10_results][7, 10, 101, 102, 103, 104] and none of the items in lista_colunas is an actual column in d['Seg10_results'] .

You might want to either:

use pandas.DataFrame.iloc (docs) for example,

d['Seg10_results'].iloc[:, lista_colunas]; or
store d['Seg10_results'].columns[lista_colunas].values in a variable, i.e. cols and do

d['Seg10_results'][cols].

edited Jul 17 '21 at 11:21

answered Jul 17 '21 at 11:06

Marvin

81
1
3

Storing the values worked fine. Thank you in advance. But is possible (probably yes) to create a loop and automate this storing for all sheets ('Seg10_results') is just one, and the names are not fixed by default. – Adilson V Casula Jul 17 '21 at 11:29
Go it. Put everything into one dataframe using this tuto: https://github.com/marsja/jupyter/blob/master/Reading%20Multiple%20Spreadsheets%20using%20Pandas.ipynb and then, split it filtering by a column, using this answer https://stackoverflow.com/a/65692842/7444022 – Adilson V Casula Jul 17 '21 at 13:46

score 0 · Answer 2 · answered Jul 17 '21 at 01:52

0

You don't need to add .columns, nor do you need values.

Instead of:

d['Seg10_results'].columns[lista_colunas].values

Try:

d['Seg10_results'][lista_colunas]

answered Jul 17 '21 at 01:52

U13-Forward

69,221
14
89
114

If I do this, i get this error - I had updated the question with the error – Adilson V Casula Jul 17 '21 at 10:42

score 0 · Answer 3 · answered Jul 17 '21 at 11:20

0

Your columns are named and indexed so i think you'll need to call them by name.

Here is a code snippet for the best way to handle data (Pandas DataFrames)

1.

import pandas as pd 
   
btc = pd.read_csv('BTC_Analysis/BTC-USD.csv')

dataframe = pd.DataFrame()

print(dataframe['Date'])

For multiple entries just add the 2nd bracket:

print(dataframe[['Date', 'Open']])

Here's some quick info from Pandas docs:

https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

Additionally you may find some useful information in this tutorial

Ps. handling XLSX files can be cumbersome, if possible it's usually better to use CSV format.

answered Jul 17 '21 at 11:20

Asutherland8219

27
2
3

This could work, but for first, i need to select the df inside my dict. – Adilson V Casula Jul 17 '21 at 11:33
Ah sorry i missed that part; I think this might be the ideal way to do it. Split the pages into their own respective df then go from there. https://stackoverflow.com/a/26521726/14999516 – Asutherland8219 Jul 17 '21 at 12:14
Yes. It was I did. But it creates a dictionary of df. Now i can't run the loop to get the selected col for every df in dict. – Adilson V Casula Jul 17 '21 at 12:23

Printing values with Pandas

Adding my solution to exporting for single file with multiple sheet

3 Answers3