0

i have two questions, the solution to the first one makes the second one possible, I think.

First question:

I have a Dataframe, df, extracted with pd.read_csv() from an incomplete csv. Incomplete means, it has many columns but is missing two columns. The first missing one is a column, stored in another csv - extracted and saved in df2 - only with float values and has the same row number as df. So the first question is, how to easily append the df2 column to the df columns.

I read about df.assign here, but it is using a Series and not another read-in dataframe: Adding new column to existing DataFrame in Python pandas

Second question:

The second missing column of df is a factor variable column which isn't stored anywhere and has to be generated depending on conditions of the data from df. This means: How to append a factor variable column to df.

The factor variable has to consist of three categories/strings ["A","B","C"].

A should only be applied to rows of df for df2's lowest 10% of values. Indexing e.g. like df[df["values"] < df["values"]*0.1]. When I have this subset of the right column ("values"), how to then say: On this row, insert the label "A" on the new column of factor variables. So I am missing here an assignment of the subset to A in a new factor variable column

B for df2's highest 10% of values. Same as A but for different subset. Likely: df[df["values"] > df["values"]*0.9]

C for a subset, depending on a third column in "df3" with different values, same as question 1, to add df3 to df, and then set depending on conditions to the new appended "df3" column at df. When "A" and "B" are solved, "C" can be solved by myself and is more or less just a repeat with different conditions or values.

Best, Pure

EDIT: Example:

df1:

weight height
80 175
75 160
70 159
90 180
100 210

df2 (or df3) single column dataframes:

age
25
35
50
40
30

factor variable column(could look like, if A is for <40 , B >=40). I don't have this column! I want to generate it depending on the age. The generated column would look like this:

sex
A
A
B
B
A

df1 after appending everything:

weight height age sex
80 175 25 A
75 160 35 A
70 159 50 B
90 180 40 B
100 210 30 A

Possible operation now: Plot weight/height on 2D, for sex A.

This is very simplified example. My values are not weight/height or integers.

  • 1
    Please add a [MRE]. – Michael Szczesny Oct 25 '21 at 10:02
  • Lots of good stuff here: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas To add a column from one df to another you can do this: df1["colName"] = df2["colName"] – pseudoabdul Oct 25 '21 at 10:17
  • thanks @pseudoabdul, this would solve question 1. But for question 2, my factor variable isn't generated yet and has to be generated depending on the conditions. –  Oct 25 '21 at 11:02
  • In the link I provided, it shows you how to apply a function to each row in a dataframe. You'd simply write your condition into that function, and then apply that function over the dataframe to produce your second column. – pseudoabdul Oct 26 '21 at 01:16

0 Answers0