0

I am trying to create a correlation matrix with R but I am having problems. All the tutorials usually use very small datasets, however, I need to select 8 rows from a big dataset and one more variable that's average of two rows. I am not sure how to do the part where I select specific rows. Can someone help me out with that? I would really appreciate any help.

Someone asked me for a sample:

"NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.1577, NA, 0.2197, NA, 0.348, NA, 0.086, NA, NA, NA, NA, NA, NA, NA, NA, 0.3768, NA, 0.2163, NA, 0.336, NA, 0.329, NA, NA, NA, NA, NA, NA, NA, NA, 0.2881, NA, 0.0632, NA, 0.235, NA, 0.167, NA, NA, NA, NA, NA, NA, NA, NA, 0.2076, NA, 0.3705, NA, 0.164, NA, 0.255, NA, NA, NA, NA, NA, NA, NA, NA, 0.1795, NA, 0.3649, NA, 0.246, NA, 0.628, NA, NA, NA, NA, NA, NA, NA, NA, 0.0227, NA, 0.3975, NA, 0.176, NA, 0.13, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.6333, NA, 0.3627, NA, 0.603, NA, 0.408, NA, NA, NA, NA, NA, NA, NA, NA, 0.6667, NA, 0.8889, NA, 0.6, NA, 0.6, NA, NA, NA, NA, NA, NA, NA, NA, 0.0545, NA, 0.2547, NA, 0.431, NA, 0.126, NA, NA, NA, NA, NA, NA, NA, NA, 0.2388, NA, 0.5514, NA, 0.32, NA, 0.424, NA, NA, NA, NA, NA, NA, NA, NA, 0.6667, NA, 0.3867, NA, 0.313, NA, 0.75, NA, NA, NA, NA, NA, NA, NA, NA, 0.752, NA, 0.482, NA, 0.349, NA, 0.24, NA, NA, NA, NA, NA, NA, NA, NA, 0.5161, NA, 0.641, NA, 0.643, NA, 0.438, NA, NA, NA, NA, NA, NA, NA, NA, 0.3492, NA, 0.3, NA, 0.391, NA, 0.645, NA, NA, NA, NA, NA, NA, NA, NA, 0.3531, NA, 0.5755, NA, 0.667, NA, 0.751, NA, NA, NA, NA, NA, NA, NA, NA, 0.2941, NA, 0.5119, NA, 0.294, NA, 0.526, NA, NA, NA, NA, NA, NA, NA, NA, 0.2941, NA, 0.1515, NA, 0.3, NA, 0.124, NA, NA, NA, "

  • We can help you out, surely, but first you need to share with us some sample of your data. [Fictitious data would work as well](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – patL Jul 30 '19 at 12:51
  • To select rows is the main doubt? `mat[i, ]` or `mat[i1:i2, ]`. – Rui Barradas Jul 30 '19 at 12:54
  • @patL I have a huge data set (9240 obs. and 110 variables). Can you tell me how I give a sample of my data? I tried using dput and other commands in the text but look at what I get in the original post. – Kristijonas Medelis Jul 30 '19 at 12:59
  • @RuiBarradas But what if there are 110 rows? Do I still have to look at which row is where? Or is there other way to do the correlation matrix by using the names of the rows. Sorry if this is a stupid question but I am really getting nowhere with internet tutorials. – Kristijonas Medelis Jul 30 '19 at 13:03
  • 1
    1) If your data set is big, a subset could be `dput(data[1:20, 1:10])`. 2) You can subset by row names. See [here](https://stackoverflow.com/questions/23475257/how-to-select-rows-in-a-table-whose-row-names-match-any-element-from-a-character/23475387#23475387), [here](https://stackoverflow.com/questions/18933187/how-to-select-some-rows-with-specific-rownames-from-a-dataframe) or [here](https://stackoverflow.com/questions/23652566/how-can-i-select-a-row-by-row-name-in-a-subsetted-data-frame-in-r). – Rui Barradas Jul 30 '19 at 13:54
  • @KristijonasMedelis Please see example of how to share a data. Plus, see Rui Barradas's comments. You can use `dput` to share your data. – patL Jul 31 '19 at 06:14

1 Answers1

0

Lets say you have a file with 110 rows and 84 columns (this makes your matrix size of 9240).

For reading you file (if your data present in a file)

data <- data.frame(read.csv("file.txt", header=TRUE, sep="\t"))

Use header "TRUE" if you have header in your file else use "FALSE".

Now select any rows according to your need.

For 1 row only (with all columns):

your_df <- data[1,]

For rows 1 to 10 only (with all columns):

your_df <- data[1:10,]

For rows 1, 3 and 10 rows (with all columns):

 your_df <- data[c(1,3,10),]

Similarly you can select any columns by entering values after comma in square bracket. For 3rd row and 2nd column:

your_df <- data[3,2]

For rows from 1st to 10th and columns from 51th to 60th:

your_df <- data[1:10,51:60]
kashiff007
  • 376
  • 2
  • 12