Create new df if value in df one column is included in df two same column name

Question

I was on here a couple days ago, and I am going to try to reword my question better. I am working with Python and Pandas and trying to do a certain task. I am working with two dataframes and they both have student IDs in there. I want to be able to merge a new dataframe together if column 2 contains a student id that is in column one. Does anybody know how to do this? I tried researching and using pd.melt(),pd.merge(), and join. But I do not believe it does not get the job done for me.

Example in a df1 column, "student", id is 1234. If df2 has that same student ID, put it in the dataframe. That way I have all the grades of one particular student in one df and they are in a different df from a different group of students.

Unless you truly find the answer, please do not just mark as duplicate because I have been searching for days for the answer.

It would be easier if you made example output. Like what df1 and df2 look like and what you expect as a result. — Ivan86, Dec 19 '17 at 00:05
I cannot really release information, but I'll do this I guess: `"Assignment " "Grade" "Student" "Date" A 85 112345 8-23-17` — DanielDDore, Dec 19 '17 at 00:08
That is what it basically looks like. All I want to do is add to that dataframe if df#2 has matching student ids in it as does df#1. Then I have a collection of a whole group of certain students with the grades. Because they are allowed multiple submissions. — DanielDDore, Dec 19 '17 at 00:11
so you would like to make a df3 that only has the same elements of df1 and df2 — Ivan86, Dec 19 '17 at 00:12
df3 would have be consisted of matching ids in df1 and df2. @Ivan86 — DanielDDore, Dec 19 '17 at 00:12
Well that do not have to be necessary erased. But I just want a df that contains all the grade values from df1 and df2 based on student id numbers that match both dataframes. — DanielDDore, Dec 19 '17 at 00:24
It's done in one line [Finding common rows (intersection) in two Pandas dataframes](https://stackoverflow.com/a/30535957/1248974) with `how='inner'` — chickity china chinese chicken, Dec 19 '17 at 02:34

Ivan86 · Accepted Answer · 2017-12-19T02:31:35.273

0

Try this:

df1 = pd.DataFrame({"Assignment": ["A","B"], "Grade": ["85", "87"],  "Student": ["112345","123234"],  "Date": ["8-23-17","3-21-17"]}, columns=["Assignment","Grade","Student","Date"], index=range(2))
df2 = pd.DataFrame({"Assignment": ["B", "F"], "Grade": ["75", "22"],  "Student": ["112345","123213"],  "Date": ["7-28-17", "4-12-17"]}, columns=["Assignment","Grade","Student","Date"], index=range(2))

print(df1)
//        Assignment Grade Student Date
//     0  A          85    112345  8-23-17
//     1  B          87    123234  3-21-17

print(df2)
//        Assignment Grade Student Date
//     0  B          75    112345  7-28-17
//     1  F          22    123213  4-12-17

Now for the extracting of duplicates:

df3 = pd.DataFrame()

for index1, value1 in df1.loc[:,['Student']].stack().iteritems():
    notYetAddedFirst = True
    for index2, value2 in df2.loc[:,['Student']].stack().iteritems():
        if value1 == value2:
            if notYetAddedFirst:
                df3 = pd.concat([df3,df1.loc[index1[0]:index1[0]]]).reset_index().drop(['index'], axis=1)
                notYetAddedFirst = False
            df3 = pd.concat([df3,df2.iloc[index2[0]:index2[0]+1]]).reset_index().drop(['index'], axis=1)

print(df3)
//         Assignment Grade Student Date
//      0  A          85    112345  8-23-17
//      1  B          75    112345  7-28-17

edited Dec 19 '17 at 02:31

answered Dec 19 '17 at 00:37

Ivan86

5,695
2
14
30

I edited to add the last lines of code for multirow dataframes – Ivan86 Dec 19 '17 at 00:42
Just copy this last chunk of code under `Or for populated dataframes with many rows:` – Ivan86 Dec 19 '17 at 00:43
I know this works because I just copied it from my console. – Ivan86 Dec 19 '17 at 00:44
It did not remove or add any data. – DanielDDore Dec 19 '17 at 00:53
I'll try it again. – DanielDDore Dec 19 '17 at 00:54
remove or add? This is creating a third dataframe with all of the elements from both – Ivan86 Dec 19 '17 at 00:54
You just added that last statement – DanielDDore Dec 19 '17 at 00:54
Oh ok, yes, try the last statement for multirow. – Ivan86 Dec 19 '17 at 00:55
It is not getting rid of the IDs I do not want. – DanielDDore Dec 19 '17 at 01:01
It seems like it is just merging the dfs and being as big as df1 + df 2. – DanielDDore Dec 19 '17 at 01:02
Ok one sec, at first it was not clear what you exactly wanted, now that you edited I get it. I will re edit. – Ivan86 Dec 19 '17 at 01:04
I have been using the portion about multiple rows. – DanielDDore Dec 19 '17 at 01:17
Ok I will retry it – DanielDDore Dec 19 '17 at 01:30
1

I think I got it. A guy told me to use merge with how='inner" – DanielDDore Dec 19 '17 at 02:30
Well this works also. Can you please test to confirm? So you can mark as correct for visitors to read. It's working perfectly on my computer. – Ivan86 Dec 19 '17 at 02:34
Will do. Give me a couple minutes to test it out. It is taking a lot of time in iPython Notebook. – DanielDDore Dec 19 '17 at 02:35
Thanks, I upvoted your answer for useful. Good to know. – Ivan86 Dec 19 '17 at 02:56

Create new df if value in df one column is included in df two same column name

1 Answers1