i have two questions, the solution to the first one makes the second one possible, I think.
First question:
I have a Dataframe, df, extracted with pd.read_csv() from an incomplete csv. Incomplete means, it has many columns but is missing two columns. The first missing one is a column, stored in another csv - extracted and saved in df2 - only with float values and has the same row number as df. So the first question is, how to easily append the df2 column to the df columns.
I read about df.assign here, but it is using a Series and not another read-in dataframe: Adding new column to existing DataFrame in Python pandas
Second question:
The second missing column of df is a factor variable column which isn't stored anywhere and has to be generated depending on conditions of the data from df. This means: How to append a factor variable column to df.
The factor variable has to consist of three categories/strings ["A","B","C"].
A should only be applied to rows of df for df2's lowest 10% of values. Indexing e.g. like df[df["values"] < df["values"]*0.1]. When I have this subset of the right column ("values"), how to then say: On this row, insert the label "A" on the new column of factor variables. So I am missing here an assignment of the subset to A in a new factor variable column
B for df2's highest 10% of values. Same as A but for different subset. Likely: df[df["values"] > df["values"]*0.9]
C for a subset, depending on a third column in "df3" with different values, same as question 1, to add df3 to df, and then set depending on conditions to the new appended "df3" column at df. When "A" and "B" are solved, "C" can be solved by myself and is more or less just a repeat with different conditions or values.
Best, Pure
EDIT: Example:
df1:
weight | height |
---|---|
80 | 175 |
75 | 160 |
70 | 159 |
90 | 180 |
100 | 210 |
df2 (or df3) single column dataframes:
age |
---|
25 |
35 |
50 |
40 |
30 |
factor variable column(could look like, if A is for <40 , B >=40). I don't have this column! I want to generate it depending on the age. The generated column would look like this:
sex |
---|
A |
A |
B |
B |
A |
df1 after appending everything:
weight | height | age | sex |
---|---|---|---|
80 | 175 | 25 | A |
75 | 160 | 35 | A |
70 | 159 | 50 | B |
90 | 180 | 40 | B |
100 | 210 | 30 | A |
Possible operation now: Plot weight/height on 2D, for sex A.
This is very simplified example. My values are not weight/height or integers.