2

So I've got a good grasp on Pandas now, and I'm trying to analyse a CSV file but hoping to do something different, where I look at not one row and one column, but one row and two columns, with the intention of expanding the number of columns based on the CSV file.

My code is :

import pandas as pd

df = pd.read_csv("UNdata_Export_20180402_123348163.csv")
df = df.set_index(["Country or Area"])
df3 = df[df.columns[0:3]]
df3=df.loc["Australia"]
print(df3)

So the output is:

                 Year  Count  Rate   Source Source Type
Country or Area                                        
Australia        2010    229   1.0  CTS/NSO          CJ
Australia        2009    263   1.2  CTS/NSO          CJ
Australia        2008    261   1.2  CTS/NSO          CJ
Australia        2007    255   1.2  CTS/NSO          CJ
Australia        2006    281   1.4  CTS/NSO          CJ
Australia        2005    259   1.3  CTS/NSO          CJ
Australia        2004    264   1.3  CTS/NSO          CJ
Australia        2003    302   1.5  CTS/NSO          CJ
Australia        2002    318   1.6  CTS/NSO          CJ
Australia        2001    310   1.6  CTS/NSO          CJ
Australia        2000    302   1.6  CTS/NSO          CJ
Australia        1999    343   1.8  CTS/NSO          CJ
Australia        1998    285   1.5  CTS/NSO          CJ
Australia        1997    321   1.7  CTS/NSO          CJ
Australia        1996    312   1.7  CTS/NSO          CJ
Australia        1995    326   1.8  CTS/NSO          CJ

I'm struggling to only choose the Year and Rate columns, as the above code prints out everything for the specific country, Australia. Also, I'm not too sure how to set "df3=df[df.columns[0:3]]". It seems as though if I change the number 3, it does not do anything.

Question: How can I choose more than one specific column, say two? And from that, how could i select 3 or more columns? What values would I need to change?

I have looked at the Python API and I could not find a similar question. EDIT: This question is different to the linked question because I'm choosing a specific row as well as specific columns. From my understanding, the other question's rows are fine, and they are not attempting to choose specific rows.

  • 1
    Try `df3=df.loc["Australia", df.columns[0:3]]`. The idea is you can select rows and columns by label simultaneously. – jpp Apr 09 '18 at 09:57
  • Possible duplicate of [Selecting columns in a pandas dataframe](https://stackoverflow.com/questions/11285613/selecting-columns-in-a-pandas-dataframe) – Nihal Apr 09 '18 at 09:57

2 Answers2

0

For selecting the first n columns:

df.iloc[:, :n]

For selecting a specific set of columns based on names:

selection = ['Count',  'Rate']
df[selection]
cdwoelk
  • 121
  • 8
0

Here are 2 possible solutions if want select by names and also by positions together:

print (df.columns[:2])
Index(['Year', 'Count'], dtype='object')

#select by names only, for columns get names by select columns by slicing
df3 = df.loc["Australia", df.columns[:2]]

What is same as selecting by names in index and columns:

df3 = df.loc["Australia", ['Count', 'Rate']]

For select by positions only use iloc and get_loc:

#select by positions only, for index get position
df3 = df.iloc[df.index.get_loc("Australia"), 0:2]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi jezrael, Thanks for that! It's working great. I just tested three columns, and it works for that, too! –  Apr 09 '18 at 10:18