0

I have read a .csv file which has 785 rows and 24217 columns.

my_data <- read.csv("test.csv", header = FALSE)

I want to segregate the columns where the value in 785 row is 6. So I tried the following:

my_data[my_data[1:24217][785,] == 6]

the result of which is mixed up columns, like in a list. How could I get the resulting structure which would still resemble the structure in csv/my_data. All I want is to subset my_data where the value in 785th row for each column is 6.

Sample set:

      V1       V2       V3       V4    .   .   .     .    . N th Column
1   0.000000 0.000000 0.000000
2   0.000000 0.000000 0.000000
3   0.000000 0.000000 0.000000
4   0.000000 0.000000 0.000000
5   0.000000 0.000000 0.000000
6   0.000000 0.000000 0.000000
7   0.000000 0.000000 0.000000
8   0.000000 0.000000 0.000000
9   0.000000 0.000000 0.000000
10  0.000000 0.000000 0.000000
.
.
.
785  0.000000 6.000000 5.50000

As you could see in the above set that are 785 rows and N columns which in my case go upto 24217. I want to subset data based upon the value in 785th row. So, I want to separate columns where last value is 6 (in the 785 row).

Jatt
  • 665
  • 2
  • 8
  • 20
  • This may have been answered already here: https://stackoverflow.com/questions/14782206/how-do-i-change-a-single-value-in-a-data-frame – Dodge Mar 20 '18 at 17:46
  • Have you actually read the question? I do not believe OP is trying to change a value of anything,...rather, this seems to be a subsetting question. – pyll Mar 20 '18 at 17:49
  • @W.Dodge I am sorry, but I do not think this is a duplicate. – Jatt Mar 20 '18 at 17:50
  • maybe it would be helpful if you provided a small sample df and a desired result... – pyll Mar 20 '18 at 17:52
  • @pyll Updated.. – Jatt Mar 20 '18 at 18:00
  • ok, so in your updated example...your desired output is all 785 rows, but only one column ('V2')...assuming all the other columns do not have a 6 in the 785th row. correct? – pyll Mar 20 '18 at 18:04
  • @pyll Yeah. I want all columns where the value in 785th row is `6`. There would be many columns where value in 785th row would be `6` – Jatt Mar 20 '18 at 18:06

1 Answers1

0
# Sample data looks like this.
# Multiple columns V1-VN (only 4 here for example)

V1 <- c(0, 0, 0, 0, 0, 0, 0)
V2 <- c(0, 0, 0, 0, 0, 0, 6)
V3 <- c(0, 0, 0, 0, 0, 0, 5.5)
V4 <- c(0, 0, 0, 0, 0, 0, 6)
df <- data.frame(V1, V2, V3, V4)

# You wants to subset the data by
# keeping the columns which have specific value (6)
# in a certain row (Row 7 here for simplicity)

newdf <- df[,df[7,]==6]

# So try this for your problem
newmy_data <- my_data[,my_data[785,]==6]

Update: Request for multiple conditions led to this answer

newdf <- df[,df[7,] %in% c(6, 5.5)]
pyll
  • 1,688
  • 1
  • 26
  • 44
  • Updated the sample set in question – Jatt Mar 20 '18 at 18:03
  • I think, that is fine. Could you please explain what exactly have you done and how is it different from what I have tried (as given in the question)? – Jatt Mar 20 '18 at 18:30
  • Sure. The first thing is actually assigning the new subset to a df, which is what `newdf <-` is doing. Then, it's just a matter of getting the logic right. I am starting with `df`, then I want all rows, but only the columns where my condition is true. – pyll Mar 20 '18 at 18:33
  • Why is it necessary to wrap `my_data[785,]==6` in `c`? – Jatt Mar 20 '18 at 18:44
  • Actually...it looks like it's not. I was thinking it had to return a list of columns. but was wrong. Updated answer. – pyll Mar 20 '18 at 18:46
  • Okay. I had approached it like `my_data[my_data[1:24217][785,] == 6]`. Is it an incorrect way? – Jatt Mar 20 '18 at 18:50
  • it's not necessary to specify the columns like that. In fact, that seems to be concatenating the columns into a long list. the answer i gave provides the desired format – pyll Mar 20 '18 at 18:55
  • If I need to also get rows with value 5. Will the following be okay? `my_data[,my_data[785,]==6 || my_data[785,]==5]`? – Jatt Mar 20 '18 at 19:04
  • No. Try this. `newmy_data <- my_data[,my_data[785,] %in% c(6, 5)]` – pyll Mar 20 '18 at 19:14