0

How would I go about looking up a specific value in a column, then selecting a subset of the column values in that row in pandas?

I have a CSV with a column full of US state names and columns with attributes about each state, but I only want certain values about the states I'm looking up.

For example, there are 50 rows (for the 50 states) and 20 columns with various data about each state, and I want to select Colorado and Florida and only 5 of the column values for those states.

Here is my code that I want to modify:

import glob
import pandas as pd
import os
import csv
 
myList = []
 
path = "/path/to/source/files/*.csv"
 
for fname in glob.glob(path):
    df = pd.read_csv(fname)
    row = df.loc[df['Province_State'] == 'Pennsylvania']
 
    # Put the date in, derived from the CSV name
    dateFromFilename = os.path.basename(fname).replace('.csv','')
    row['Date'] = dateFromFilename
 
    myList.append(row)
    print(row)
 
concatList = pd.concat(myList, sort=True)
 
concatList.to_csv('/path/to/output.csv', index=False, header=True)
reallymemorable
  • 882
  • 1
  • 11
  • 28

1 Answers1

1

You just need to use isin and pass list of columns to loc:

myList = []

path = "/path/to/source/files/*.csv"

col_lists = ['col1','col2','col3']
 
for fname in glob.glob(path):
    df = pd.read_csv(fname)

    # changes here
    row = df.loc[df['Province_State'].isin('Florida', 'Colorado'),
                 col_list]

    # pivot
    row = (row.assign(idx=row.groupby('Province_State').cumcount()).     
              .pivot(index='idx', columns='Province_State')
          )
    # rename
    row.columns = [f'{x}_{y}' for x,y in row.columns]          

    # Put the date in, derived from the CSV name
    dateFromFilename = os.path.basename(fname).replace('.csv','')
    row['Date'] = dateFromFilename
 
    myList.append(row)
    print(row)
 
concatList = pd.concat(myList, sort=True)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • How would I rename the columns as I grab them? Because I will end up with two `col1`s for each of state. I need to end up with `Florida_col1`, `Colorado_col1`, etc. – reallymemorable Nov 13 '20 at 16:11
  • There was a syntax error here, i think, which I fixed: `row = (row.assign(idx=row.groupby('Province_State').cumcount()).pivot(index='idx', columns='Province_State')` but I am still getting one here, which I don't understand: `row.columns = [f'{x}_{y}' for x,y in row.columns]` – reallymemorable Nov 13 '20 at 16:24
  • Before that line `row` has two-level columns. See Question/Anwer 10/11 in [this thread](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) for details. – Quang Hoang Nov 13 '20 at 16:26
  • So I need to replace `x,y` with actual column names? – reallymemorable Nov 13 '20 at 16:31
  • No, that’s what the for loop is for. – Quang Hoang Nov 13 '20 at 16:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/224532/discussion-between-reallymemorable-and-quang-hoang). – reallymemorable Nov 13 '20 at 16:41