1

I have two dataframes:

ONE=pd.read_csv('ONE.csv')
value_one  value_two  
2          4          
3          1          
4          2          

TWO=pd.read_csv('TWO.csv')
X  1    2    3    4    5    6    7    8
1  a    c    j    a    d    c    c    d
2  c    k    a    d    c    c    d    e
3  f    c    k    a    d    c    c    d
4  c    k    a    d    c    c    d    j

I need to create additional column in ONE dataframe ( ONE['result']) in conditions:

if value_one is equal to value from header of dataframe TWO

and value_two is equal to value from TWO dataframe in X column,

set in new column common value.

expected result:

value_one  value_two  result   
2          4          k
3          1          j
4          2          d

I tried: use to compare only header if ONE[value_one]==TWO.iloc[0]

Thank you, S.

jpp
  • 159,742
  • 34
  • 281
  • 339
sygneto
  • 1,761
  • 1
  • 13
  • 26

1 Answers1

1

lookup

You can lookup your second dataframe:

df_two = df_two.set_index('X')  # set 'X' column as index
df_two.columns = df_two.columns.astype(int)  # ensure column labels are numeric

df_one['result'] = df_two.lookup(df_one['value_two'], df_one['value_one'])

print(df_one)

   value_one  value_two result
0          2          4      k
1          3          1      j
2          4          2      d
jpp
  • 159,742
  • 34
  • 281
  • 339
  • thanks, i had to edit your code `df_two = df_two.set_index('X',inplace=True)` and add one more line because of some problems : `df_two.reset_index(inplace=True, drop=True)` but i am receiving at the end still one error: `raise KeyError('One or more row labels was not found') KeyError: 'One or more row labels was not found' ` any idea why? ` maybe in my original file there are more lines than in second table, and python is not finding comparsion for them? could we do any exception for it? – sygneto Dec 05 '18 at 17:47
  • Do **not** use `df_two = df_two.set_index('X', inplace=True)`. Either use `df_two.set_index('X', inplace=True)` (without assignment) or `df_two = df_two.set_index('X')` as in my example. See [here](https://stackoverflow.com/questions/43893457/python-pandas-understanding-inplace-true) for more details. – jpp Dec 05 '18 at 19:03
  • 1
    You probably have duplicate columns, as your error indicates. That's another problem, another question which has been asked many times. – jpp Dec 06 '18 at 08:02
  • but the point is that i dont see any duplicated columns, also to check it i used `df.T.drop_duplicates().T` but still i am reveiving ` this error: `raise InvalidIndexError('Reindexing only valid with uniquely' pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects` – sygneto Dec 06 '18 at 08:23
  • Check by looking at `df.columns`. – jpp Dec 06 '18 at 09:07
  • `print(df_two.columns)` result: `Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')` – sygneto Dec 06 '18 at 10:17
  • So that's not the dataframe you've included above, which has an `'X'` column.. spot the difference? – jpp Dec 06 '18 at 10:26
  • huh, Your right, now i see , when i delete line `df_two = df_two.set_index('X')` my result of `print(df_two.columns)` is `Index(['X', '1', '2', '3', '4', '5', '6', '7', '8'], dtype='object')` but in this case i cannot use `df_two.columns = df_two.columns.astype(int)` anymore because of `TypeError: Cannot cast Index to dtype ` – sygneto Dec 06 '18 at 10:38