-1

I have a dataframe of id's and numbers:

df_id =  id|values
         11|100
         22|109
         33|400
         44|90
         55|1000

I also have a dataframe that has id and age like this:

df =  id|age
      11|23
      22|21
      33|25
      44|20
      55|30

I want to get the age values from the second dataframe based on the id of the first dataframe, so it'd be: 11 is 23, 22 is 21, ...?

I have tried to query and read about it, but I don't know how to implement it exactly

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
mouhamad
  • 51
  • 6
  • join dfs on id? – lone_ranger Apr 22 '22 at 20:47
  • I assume you're using Pandas, so I added the tag for you. If that's incorrect, you can [edit] to fix it. – wjandrea Apr 22 '22 at 20:48
  • What output are you trying to get? `df_id` with the `age` column joined to it, or just the `age` column selected by the `df_id` index? Please [edit] to clarify. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Apr 22 '22 at 20:51
  • @lone_ranger is there a different way of doing it without merge? – mouhamad Apr 22 '22 at 20:51
  • 1) why without merge, it is just the obvious way. 2) you still didnt answer @wjandrea what your output shoud look like. do you want a df, or just one single list of values. – Rabinzel Apr 22 '22 at 20:55

1 Answers1

1

Approach without merge (not suggested)

This is a pure pythonic approach, but not suggested due to inefficiency.

def splitBar(strValue):
  return strValue.strip().split("|")

with open("df.csv") as f:
  f.readline() # to remove the first line
  df = f.readlines()
  df = list(map(splitBar, df))
  idDf = [x[0] for x in df]
  age = [x[1] for x in df]

with open("df_id.csv") as f:
  f.readline() # to remove the first line
  df_id = f.readlines()
  df_id = list(map(splitBar, df_id))
  idDf_id = [x[0] for x in df_id]
  value = [x[1] for x in df_id]

[[id,value[index], age[index]] for index, id in enumerate(idDf_id) if id in idDf]

Output

[['11', '100', '23'],
 ['22', '109', '21'],
 ['33', '400', '25'],
 ['44', '90', '20'],
 ['55', '1000', '30']]

Note that, I assumed the name of the files you are dealing with are df.csv and df_id.csv. Also, note that the output matrix has three columns: the first one is the id, the second is the values and the last one is the age.

Merge approach (suggested)

If you are using pandas module, you should consider using merge function:

df_id.merge(df, on="id")
TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
  • is there a different way to do it without merge? – mouhamad Apr 22 '22 at 20:51
  • @mouhamad Of course, but please mention how you want the approach to be. Do you want to use pure pythonic approaches? – TheFaultInOurStars Apr 22 '22 at 20:52
  • When this is a trivial merge, please close as duplicate to [merge 101](https://stackoverflow.com/questions/53645882/pandas-merging-101), **everything** is explained in details there. – mozway Apr 22 '22 at 20:52
  • @mozway Thanks for the comment. Seems it is not a trivial merge. So I will vote on the reopening the question and edit my answer to be pure pythonic script afterward. – TheFaultInOurStars Apr 22 '22 at 20:55
  • @AmirhosseinKiani so what i saw online is using the query functionality but it didnt work well so either a pure pythonic approach or something like query would do the trick – mouhamad Apr 22 '22 at 20:55
  • This is a trivial merge and the duplicate provided non-merge alternatives. – mozway Apr 22 '22 at 20:56
  • @mouhamad the real question is, why don't you want to use `merge`? Is this an assignment? – mozway Apr 22 '22 at 20:58