First of all, I am totally new on Python, so, maybe is something super simple I am not doing correctly.
I am reading a multiple worksheet xlsx file and sending each of them to separated dataframe. (at least, I think I am doing it).
xl = pd.ExcelFile("results/report.xlsx")
d = {} # your dict.
for sheet in xl.sheet_names:
d[f'{sheet}']= pd.read_excel(xl,sheet_name=sheet)
lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg3_results'].columns[lista_colunas].values
This is the result.
>>> print(d)
{'Sheet': Empty DataFrame
Columns: []
Index: [], 'report': Empty DataFrame
Columns: []
Index: [], 'Seg10_results': ID Hora de início Hora de conclusão Email ... Humanas Exatas Linguagens Biológicas
0 1 2021-04-28 13:38:51 2021-04-28 16:25:59 anonymous ... 38 50 38 38
1 2 2021-04-28 17:02:11 2021-04-28 17:57:48 anonymous ... 25 0 25 38
[2 rows x 105 columns], 'Seg1_results': ID Hora de início Hora de conclusão ... Exatas Linguagens Biológicas
0 1 2020-05-26 08:30:00 2020-05-26 09:15:00 ... 25 29 38
1 2 2020-05-26 08:31:12 2020-05-26 09:21:38 ... 38 33 38
2 3 2020-05-26 08:27:40 2020-05-26 09:21:38 ... 50 29 38
Then, I am trying to print just some of columns of each df (trying it manually)
lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg10_results'].columns[lista_colunas].values
But I am getting only this:
>>> d['Seg10_results'].columns[lista_colunas].values
array(['NOME COMPLETO', 'QUAL A SUA OFICINA DE APRENDIZAGEM?', 'Humanas',
'Exatas', 'Linguagens', 'Biológicas'], dtype=object)
Any value is being shown
If I call only d['Seg10_results'][lista_colunas]
, I get this:
>>> d['Seg10_results'][lista_colunas]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3461, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis)
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([7, 10, 101, 102, 103, 104], dtype='int64')] are in the [columns]"
What am I doing wrongly?
In time, this is part of a major work. All I am trying to do it, filter some columns of all worksheets, and save them into a new xlsx file (again, separated by worksheets, but filtered)
Adding my solution to exporting for single file with multiple sheet
I know this is far from a beautiful code, but it is working at the moment.
dados = pd.read_excel("results/report.xlsx", sheet_name=None)
df = pd.concat(dados[frame] for frame in dados.keys())
lista_colunas = [7, 10, 101, 102, 103, 104]
filtro = df.columns[lista_colunas]
final_df = df[filtro]
grouped_df = final_df.groupby(final_df.columns[1])
salas = grouped_df.groups.keys()
writer = pd.ExcelWriter('results/resultado.xlsx', engine='xlsxwriter')
for sala in salas:
splitdf = grouped_df.get_group(sala)
splitdf.to_excel(writer, sheet_name=str(sala))
writer.save()