1

How do I select multiple columns from Excel using openpyxl and make a dataframe in Pandas. It looks like something like with. I wanted to select column A, B, D, E, G. Thanks!

enter image description here

W_YY
  • 189
  • 1
  • 1
  • 8
  • Related: https://stackoverflow.com/a/49442625/6672746 – Evan Nov 16 '18 at 16:56
  • Thks for pointing but I need to use openpyxl. that is pd.excel not the same thing? – W_YY Nov 16 '18 at 17:07
  • Well, you asked to make a dataframe in `pandas`, for which you will obviously need pandas. Is there a reason you can't use `pd.read_excel`? – Evan Nov 16 '18 at 17:09
  • fair. I am trying to avoid only because coz this is part of my task and if i switch to pd.read_excel that will lead to more work on the other area. – W_YY Nov 16 '18 at 17:12

1 Answers1

0

Given a file "2018-11-16.xlsx" with some data like your image,

import pandas as pd
df = pd.read_excel("2018-11-16.xlsx", header=0, usecols="A,B,D,E,G")
df.drop(df.index[0:2])

Output:

           Code1 Code3 Code4 Code6
2018-01-01    23    89    23    23
2018-02-01    24     2    23    23
2018-03-01    25     3    23    23
2018-04-01    26     2    23    23
2018-05-01    27    71    23    23
2018-06-01    28    71    23    23
2018-07-01    29    71    23    23
2018-08-01    30    71    23    23

If you want to make a dataframe, I recommend reading directly into pandas. OpenPyXL and Pandas do work well together, but exactly how to use them depends on what you're trying to do. For numerical analysis, data munging, or plotting, read directly into pandas. For formatting, use openpyxl.

Evan
  • 2,121
  • 14
  • 27