0

I have a dataframe with multiple columns and 700+ rows and a series of 27 rows. I want to create a new column i.e. series in dataframe as per matching indexes with predefined column in df

data frame I have and need to add series which contains the same indexes of "Reason for absence"

     ID  Reason for absence  Month of absence  Day of the week  Seasons  
0    11                  26                 7                3        1   
1    36                   0                 7                3        1   
2     3                  23                 7                4        1   
3     7                   7                 7                5        1   
4    11                  23                 7                5        1   
5     3                  23                 7                6        1   
6    10                  22                 7                6        1   
7    20                  23                 7                6        1   
8    14                  19                 7                2        1   
9     1                  22                 7                2        1   
10   20                   1                 7                2        1   
11   20                   1                 7                3        1   
12   20                  11                 7                4        1   
13    3                  11                 7                4        1   
14    3                  23                 7                4        1   
15   24                  14                 7                6        1   
16    3                  23                 7                6        1   
17    3                  21                 7                2        1   
18    6                  11                 7                5        1   
19   33                  23                 8                4        1   
20   18                  10                 8                4        1   
21    3                  11                 8                2        1   
22   10                  13                 8                2        1   
23   20                  28                 8                6        1   
24   11                  18                 8                2        1   
25   10                  25                 8                2        1   
26   11                  23                 8                3        1   
27   30                  28                 8                4        1   
28   11                  18                 8                4        1   
29    3                  23                 8                6        1   
30    3                  18                 8                2        1   
31    2                  18                 8                5        1   
32    1                  23                 8                5        1   
33    2                  18                 8                2        1   
34    3                  23                 8                2        1   
35   10                  23                 8                2        1   
36   11                  24                 8                3        1   
37   19                  11                 8                5        1   
38    2                  28                 8                6        1   
39   20                  23                 8                6        1   
40   27                  23                 9                3        1   
41   34                  23                 9                2        1   
42    3                  23                 9                3        1   
43    5                  19                 9                3        1   
44   14                  23                 9                4        1   

this is series table s_conditions

0                                        Not absent
1                 Infectious and parasitic diseases
2                                         Neoplasms
3                             Diseases of the blood
4     Endocrine, nutritional and metabolic diseases
5                  Mental and behavioural disorders
6                    Diseases of the nervous system
7                               Diseases of the eye
8                               Diseases of the ear
9                Diseases of the circulatory system
10               Diseases of the respiratory system
11                 Diseases of the digestive system
12                             Diseases of the skin
13           Diseases of the musculoskeletal system
14             Diseases of the genitourinary system
15                         Pregnancy and childbirth
16                 Conditions from perinatal period
17                         Congenital malformations
18                Symptoms not elsewhere classified
19                                           Injury
20                                  External causes
21                Factors influencing health status
22                                Patient follow-up
23                             Medical consultation
24                                   Blood donation
25                           Laboratory examination
26                              Unjustified absence
27                                    Physiotherapy
28                              Dental consultation
dtype: object

I tried this

df1.insert(loc=0, column="Reason_for_absence", value=s_conditons)

out- this is wrong because i need the reason_for_absence colum according to the index of reason for absence and s_conditions

                                Reason_for_absence  ID  Reason for absence  \
0                                       Not absent  11                  26   
1                Infectious and parasitic diseases  36                   0   
2                                        Neoplasms   3                  23   
3                            Diseases of the blood   7                   7   
4    Endocrine, nutritional and metabolic diseases  11                  23   
5                 Mental and behavioural disorders   3                  23   
6                   Diseases of the nervous system  10                  22   
7                              Diseases of the eye  20                  23   
8                              Diseases of the ear  14                  19   
9               Diseases of the circulatory system   1                  22   
10              Diseases of the respiratory system  20                   1   
11                Diseases of the digestive system  20                   1   
12                            Diseases of the skin  20                  11   
13          Diseases of the musculoskeletal system   3                  11   
14            Diseases of the genitourinary system   3                  23   
15                        Pregnancy and childbirth  24                  14   
16                Conditions from perinatal period   3                  23   
17                        Congenital malformations   3                  21   
18               Symptoms not elsewhere classified   6                  11   
19                                          Injury  33                  23   
20                                 External causes  18                  10   
21               Factors influencing health status   3                  11   
22                               Patient follow-up  10                  13   
23                            Medical consultation  20                  28   
24                                  Blood donation  11                  18   
25                          Laboratory examination  10                  25   
26                             Unjustified absence  11                  23   
27                                   Physiotherapy  30                  28   
28                             Dental consultation  11                  18   
29                                             NaN   3                  23   
30                                             NaN   3                  18   
31                                             NaN   2                  18   
32                                             NaN   1                  23   

i am getting output upto 28 rows and NaN values after that. Instead, I need correct order of series according to indexes for all the rows

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
Aarav
  • 1
  • https://stackoverflow.com/questions/27126511/add-columns-different-length-pandas – mtm1186 Nov 18 '22 at 16:26
  • The prompt is a bit unclear, are you looking to perform a left join and bring in your reasoning field into the dataframe with your key being the "Reason For Absence" column? If so, you can look at pd.merge() – Carson Whitley Nov 18 '22 at 17:45

1 Answers1

0

While this question is a bit confusing, it seems the desire is to match the series index with the dataframe "Reason for Absence" column. If this is correct, below is a small example of how to accomplish. Keep in mind, the resulting dataframe will be sorted based on the 'Reason for Absence Numerical' column. If my understanding is incorrect, please clarify this question so we can better assist you.

d = {'ID': [11,36,3], 'Reason for Absence Numerical': [3,2,1], 'Day of the Week': [4,2,6]}
dataframe = pd.DataFrame(data=d)

s = {0: 'Not absent', 1:'Neoplasms', 2:'Injury', 3:'Diseases of the  eye'}
disease_series = pd.Series(data=s)


def add_series_to_df(df, series, index_val):
    df_filtered = df[df['Reason for Absence Numerical'] == index_val].copy()
    series_filtered = series[series.index == index_val]
    if not df_filtered.empty:
        df_filtered['Reason for Absence Text'] =  series_filtered.item()
        return df_filtered


x = [add_series_to_df(dataframe, disease_series, index_val) for index_val in range(len(disease_series.index))]
new_df = pd.concat(x)
print(new_df)
  • Thanks a lot! Yes, I wanted exactly like this. What is use of df_filtered and series_filtered variable? can you please explain as it will help me to understand this better? When I try to update old df with new df it didnt work. Can I get the result on the same df without creating new df? I tried dataframe.update method() but it doesn't work – Aarav Nov 19 '22 at 20:58
  • I ran this code inside a function and to sort the order of index of dataframe and I got the update as expected but i need this to be sorted with index so, i used this code - new_df.sort_index(inplace=True) and for update I used assign - df.assign(new_df) and df=df.update(new_df) which also didn't update – Aarav Nov 19 '22 at 21:31
  • The df_filtered and series filtered variables only keep the df & series rows where index_val equals df['Reason for Absence Numerical'] & series.index. This allows the the text associated with the series index to be populated in the new 'Reason for Absence Text' column. –  Nov 19 '22 at 21:40
  • Thanks for the clarification. How do I update the original dataframe with the new dataframe. When I run whole code inside another function it doesn't update the old dataframe. – Aarav Nov 19 '22 at 22:00
  • The “new” df is the “original” df plus the “new” column requested in the original post. I believe this provides the desired solution (based on the provided information). If this is correct, consider marking this issue as answered and then post a new question to address follow-on issues. If the provided response does not address the issue satisfactorily, request you include pertinent scripts and Tracebacks so the community can better understand the issue and assist. –  Nov 19 '22 at 23:14