3

I have loaded a two column csv file in pandas data frame that has a unique key in column[0] (integer index) of every row and a value in column[1] of every row. It has no header.

I want simply to look up the single value in column[1] (or the whole row series) by using a key to match against the value column[0] to locate the row.

of course efficient etc. As if it were a dictionary with the string keys like value = dict.get(key)

example csv file:

a, "valueForA"
x, "valueForX"
z, "valueForZ"

...

df = pandas.read_csv(fileAbove)

row = df.wayToFindRowByColumn0Value('x')  # row is a series of 2 elements

row[0] should be the key 'x'
row[1] should be 'valueForX'

row = df.wayToFindRowByColumn0Value('notThere')  # row should be None if not found
Umar.H
  • 22,559
  • 7
  • 39
  • 74
peterk
  • 5,136
  • 6
  • 33
  • 47
  • Add small example dataset and what you expected output is. [How to make a good reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Erfan Aug 11 '19 at 21:55

5 Answers5

3

My suggestion is to store the values as a simple pandas Series (rather than a data frame) using the key field as index and the value field as series values. This can be achieved by calling the read_csv method in this way (some details are from the OP's answer).

s = pd.read_csv('/data/companyDataByCusip.csv', encoding='utf_8', names=['cusip','value'], index_col='cusip', squeeze=True)

Now you can access the Series just like a dictionary:

val = s[key]

If a data frame is really needed, you could remove the squeeze argument from read_csv and access the value using df.at[key,'value'].

GZ0
  • 4,055
  • 1
  • 10
  • 21
1

Use df.at[index, column]. See documentation here.

For example:

import pandas as pd

df = pd.DataFrame(data=[{'value': x*x} for x in range(10)], 
                  index=[x for x in range(10)])
print(df)
print('At 4: {}'.format(df.at[4, 'value']))

Output:

   value
0      0
1      1
2      4
3      9
4     16
5     25
6     36
7     49
8     64
9     81

At 4: 16

Update:

To get a dataframe as a return, can try something like this:

row = df[df.index.values == 4]
print(row)

Output:

   value
4     16
alexbclay
  • 1,389
  • 14
  • 19
  • almost right except I want to get the row series or value in integer indexed column [1] for the row with the non-integer string name in column[0] of the row. df.at takes two non integer lookup names, and df.iat takes two integer indices like indexing a matrix. – peterk Aug 11 '19 at 22:17
  • Does it need to be a pandas Series? Or can you just build your own list? ie: `row = [4, df.at[4, 'value']]` – alexbclay Aug 11 '19 at 22:35
  • no I could parse and use a dict or other map type. I have been asked to do this in pandas. – peterk Aug 11 '19 at 22:44
0

You can try:

df[df.column0==key].column1.iloc[0]
kederrac
  • 16,819
  • 6
  • 32
  • 55
0

A simple loc can do th e job:

df.loc[[integer index]]
M-M
  • 440
  • 2
  • 16
0

Ok poking around looks like using .loc() is the way but it reqires you set a column explicitly as the row index field, and a set explicit column names. It works, though I havn't speed tested it.

The doc I found is here:

https://www.kdnuggets.com/2019/04/pandas-dataframe-indexing.html

#  read in the file of the cusip database to a data frame
df = pandas.read_csv('/data/companyDataByCusip.csv',encoding='utf_8', names=['cusip','value'])
df.set_index('cusip',inplace=True)

rec = df.loc[cusip,:]
print(rec) # series for row
peterk
  • 5,136
  • 6
  • 33
  • 47