0

Image of CSV

I have a blob (CSV) in a database. I prepared a string buffer and created a panda data frame. The CSV file does not have column names for certain columns and certain column names are repeated.

For Example: In case of needing to fetch the intersecting value for B5 = search_row and E2 = search_column. ie E5 = value_to_be_fetched.

I just have the text value search_row and search_column. How do I find the row index as B5 and column index as E2? As well as fetch the value E5 = value_to_be_fetched.

Tobiah Rex
  • 2,247
  • 2
  • 14
  • 17
Gayatri
  • 3
  • 4
  • Do you want `val = df.loc[search_row, search_column]` ? – jezrael Jan 28 '18 at 06:30
  • yes, but how do i find index of "search_row" and "search_column" ? i cannot set any column as index because some columns have no header at all and some column names are repeated – Gayatri Jan 28 '18 at 06:31
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jan 28 '18 at 06:32
  • Hmmm, so values `search_row` and `search_column` are unique in all data? Because if not, with duplicates in columns and indices is possible select by positions only - `df.iloc[pos_index, pos_column]`. – jezrael Jan 28 '18 at 06:47
  • yeah the data are unique in csv file. confusion here is, i have to find the value where row="search_row" and column="search_column" and there are no column headers – Gayatri Jan 28 '18 at 06:57
  • I am new to python and panda dataframe. If i can get any sample code, that will be of great help. – Gayatri Jan 28 '18 at 07:15
  • No problem you are new, but your problem is very rare, because obviously is clear which column is necessary use for finding data. Then is used [`boolean indexing`](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing). But if columns, indices should be duplicated or is not posible use position of columns for finding data, it is really complicated and solution is bellow. – jezrael Jan 28 '18 at 07:19

1 Answers1

2

If values search_row and search_column are unique in all data use np.where for positions and select by DataFrame.iloc:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,500,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,300,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')}, index = [1] * 6)
df.columns = ['A'] * 6
print (df)
   A    A  A    A  A  A
1  a    4  7    1  5  a
1  b    5  8  300  3  a
1  c    4  9    5  6  a
1  d    5  4    7  9  b
1  e  500  2    1  2  b
1  f    4  3    0  4  b

a = np.where(df == 500)[0]
b = np.where(df == 300)[1]
print (a)
[4]
print (b)
[3]

c = df.iloc[a[0],b[0]]
print (c)
1

But if values should be duplicated is possible select only first occurence, because np.where return array(s) with length > 1:

a = np.where(df == 5)[0]
b = np.where(df == 2)[1]
print (a)
[0 1 2 3]
print (b)
[2 4]

c = df.iloc[a[0],b[0]]
print (c)
7

a = np.where(df == 2)[0]
b = np.where(df == 5)[1]
print (a)
[4 4]
print (b)
[4 1 3 1]

c = df.iloc[a[0],b[0]]
print (c)
2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252