0

I have a df that I split columns off of for scaling and pca analysis. I did the analysis on the continuous numerical columns and now I am trying to put them back together, the 2 columns that were categorical and then the scaled data.

Both dfs have the same number of rows, there are no null values in this practice analysis. When I try and concat them in numerous different ways I get the correct number of rows, but so many null values that make no sense. Code is as follows:

Note - categorical_columns_df is categorical columns

Note - scaled_df is scaled data that corresponds directly to categorical columns data

dfs_to_concat[categorical_columns_df]
new_df = pd.concat(dfs_to_concat)
new_df

Time Baby_ID AGE_Under-1 AGE_Under-2 AGE_Under-3 AGE_Under-4 Input (X) Output (y) HR 
Height  Weight
 0  3:00:00 AM  1.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
 1  4:00:00 AM  1.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
 2  5:00:00 AM  1.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
 3  6:00:00 AM  1.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
 4  7:00:00 AM  1.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
 ......................................................
751 NaN NaN NaN NaN NaN NaN 0.604396    0.60    0.532895    0.642857    0.642857
2752 NaN    NaN NaN NaN NaN NaN 0.615385    0.61    0.559211    0.642857    0.642857
2753 NaN    NaN NaN NaN NaN NaN 0.626374    0.62    0.578947    0.642857    0.642857
2754 NaN    NaN NaN NaN NaN NaN 0.615385    0.63    0.572368    0.642857    0.642857
2755 NaN    NaN NaN NaN NaN NaN 0.604396    0.62    0.559211    0.642857    0.642857

What is going on here? What am I messing up in the code that I am getting nulls for half of the columns half the time and then nulls for the other half at the bottom? I have concated many dfs before an never run into this problem. Any insight is appreciated.

  • 2
    You could try to add `axis=1` argument to the `pd.concat` call. – Michael Butscher Jan 30 '23 at 17:28
  • 1
    *"I get the correct number of rows"* -- Are you only looking at the index to determine that? If so, I bet you actually have twice as many rows as you should. Anyway, that's beside the point. You just forgot `axis=1`. I added some links to existing questions about this. – wjandrea Jan 30 '23 at 17:38
  • 1
    I did forget to put that in, but I had tried it. I found the issue, one of the dfs I was using still had the outlier rows in it and they all happened to be at the end of the df with high index, so when I was viewing the concatenated new_df, I was seeing all the nulls for the removed outliers. So, I the syntax right, just was using the wrong source, and you are correct I needed the axis. Thanks for chiming in. Alls well that ends well. – Wildo_Baggins311 Jan 30 '23 at 17:38
  • 1
    Cool cool, good to hear that. BTW, for future questions, please make a [reproducible pandas example](/q/20109391/4518341), which can help catch issues like this. – wjandrea Jan 30 '23 at 17:39
  • Thanks for the reference! I need all the references to well explained stuff I can get. – Wildo_Baggins311 Jan 30 '23 at 17:51

0 Answers0