0

I was on here a couple days ago, and I am going to try to reword my question better. I am working with Python and Pandas and trying to do a certain task. I am working with two dataframes and they both have student IDs in there. I want to be able to merge a new dataframe together if column 2 contains a student id that is in column one. Does anybody know how to do this? I tried researching and using pd.melt(),pd.merge(), and join. But I do not believe it does not get the job done for me.

Example in a df1 column, "student", id is 1234. If df2 has that same student ID, put it in the dataframe. That way I have all the grades of one particular student in one df and they are in a different df from a different group of students.

Unless you truly find the answer, please do not just mark as duplicate because I have been searching for days for the answer.

  • It would be easier if you made example output. Like what df1 and df2 look like and what you expect as a result. – Ivan86 Dec 19 '17 at 00:05
  • I cannot really release information, but I'll do this I guess: `"Assignment " "Grade" "Student" "Date" A 85 112345 8-23-17` – DanielDDore Dec 19 '17 at 00:08
  • That is what it basically looks like. All I want to do is add to that dataframe if df#2 has matching student ids in it as does df#1. Then I have a collection of a whole group of certain students with the grades. Because they are allowed multiple submissions. – DanielDDore Dec 19 '17 at 00:11
  • so you would like to make a df3 that only has the same elements of df1 and df2 – Ivan86 Dec 19 '17 at 00:12
  • df3 would have be consisted of matching ids in df1 and df2. @Ivan86 – DanielDDore Dec 19 '17 at 00:12
  • and the rest erased from both df1 and df2? – Ivan86 Dec 19 '17 at 00:17
  • Ok i'll work something out. – Ivan86 Dec 19 '17 at 00:17
  • Well that do not have to be necessary erased. But I just want a df that contains all the grade values from df1 and df2 based on student id numbers that match both dataframes. – DanielDDore Dec 19 '17 at 00:24
  • It's done in one line [Finding common rows (intersection) in two Pandas dataframes](https://stackoverflow.com/a/30535957/1248974) with `how='inner'` – chickity china chinese chicken Dec 19 '17 at 02:34

1 Answers1

0

Try this:

df1 = pd.DataFrame({"Assignment": ["A","B"], "Grade": ["85", "87"],  "Student": ["112345","123234"],  "Date": ["8-23-17","3-21-17"]}, columns=["Assignment","Grade","Student","Date"], index=range(2))
df2 = pd.DataFrame({"Assignment": ["B", "F"], "Grade": ["75", "22"],  "Student": ["112345","123213"],  "Date": ["7-28-17", "4-12-17"]}, columns=["Assignment","Grade","Student","Date"], index=range(2))

print(df1)
//        Assignment Grade Student Date
//     0  A          85    112345  8-23-17
//     1  B          87    123234  3-21-17

print(df2)
//        Assignment Grade Student Date
//     0  B          75    112345  7-28-17
//     1  F          22    123213  4-12-17

Now for the extracting of duplicates:

df3 = pd.DataFrame()

for index1, value1 in df1.loc[:,['Student']].stack().iteritems():
    notYetAddedFirst = True
    for index2, value2 in df2.loc[:,['Student']].stack().iteritems():
        if value1 == value2:
            if notYetAddedFirst:
                df3 = pd.concat([df3,df1.loc[index1[0]:index1[0]]]).reset_index().drop(['index'], axis=1)
                notYetAddedFirst = False
            df3 = pd.concat([df3,df2.iloc[index2[0]:index2[0]+1]]).reset_index().drop(['index'], axis=1)

print(df3)
//         Assignment Grade Student Date
//      0  A          85    112345  8-23-17
//      1  B          75    112345  7-28-17
Ivan86
  • 5,695
  • 2
  • 14
  • 30