1

i am working on data analysis and need to normalize the data. i have used following coomand for normalization z1<-data.Normalization(hfiltered,type = "n10",normalization = "row",na.rm=FALSE)

and z<-data.Normalization(hfiltered,type = "n1",normalization = "column",na.rm=FALSE). But didn't understood that what is the difference between column normalization and row normalization.

  • Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Feb 18 '20 at 07:02

1 Answers1

2

This is too ...long for a comment and hence providing it in answer,Not sure if it answers your problem but Let us break it in steps:

Normalizaton:

There a lot of ways one can normalize a data , one such way is min max normalization. (xi - min(x))/range(x), where x a feature and xi is the individual value for that feature. Normalization helps to keep everything between a scale of 0 to 1 (it can be different for different normalization). This helps in comparing features which is now on the same scale after normalization.

Column Normalization:

Well, column normalization deals with normalizing the features independently from each other. Column normalization is more prevalent and meaningful when we use PCA, kmeans or other algorithms, it sometimes also helps models to converge faster while used in deeplearning.

Row Normalization:

Now, row normalization is somewhat delicate and usually not that prevalent unless you have data like counts, in other words your features that are unit-less, although care must be taken in case you have different units for different features. Cases were features are having different units might not be suitable for row normalization. An example could be where let us say a data contains lot of samples where each sample represented in rows and let us assume that sample is distributed into further 5 features/groups, so in this case a row normalization will help to understand the feature/group proportion of a given sample.

Normalization within a column will hold the information intact but within a row sometimes it doesn't. For example in a given data, if a measure a customer age and Income as features, Now if you normalize it on column the pattern where customer A is younger or older and/or more income or less income will hold after column normalization when compared with customer B, But this may not able to hold if you do a row normalization (which is kind of information loss).

This is well explained here

PKumar
  • 10,971
  • 6
  • 37
  • 52
  • Regarding a timeseries classification problem, where my input `x` is a univariate timeseries and my output `y` is a class i want to predict, should i perform a row normalization (between `0`and `1` for example) at each row independently? – Murilo Mar 20 '23 at 11:40