Sub-setting csv structure based upon the value in last row

Question

I have read a .csv file which has 785 rows and 24217 columns.

my_data <- read.csv("test.csv", header = FALSE)

I want to segregate the columns where the value in 785 row is 6. So I tried the following:

my_data[my_data[1:24217][785,] == 6]

the result of which is mixed up columns, like in a list. How could I get the resulting structure which would still resemble the structure in csv/my_data. All I want is to subset my_data where the value in 785th row for each column is 6.

Sample set:

      V1       V2       V3       V4    .   .   .     .    . N th Column
1   0.000000 0.000000 0.000000
2   0.000000 0.000000 0.000000
3   0.000000 0.000000 0.000000
4   0.000000 0.000000 0.000000
5   0.000000 0.000000 0.000000
6   0.000000 0.000000 0.000000
7   0.000000 0.000000 0.000000
8   0.000000 0.000000 0.000000
9   0.000000 0.000000 0.000000
10  0.000000 0.000000 0.000000
.
.
.
785  0.000000 6.000000 5.50000

As you could see in the above set that are 785 rows and N columns which in my case go upto 24217. I want to subset data based upon the value in 785th row. So, I want to separate columns where last value is 6 (in the 785 row).

This may have been answered already here: https://stackoverflow.com/questions/14782206/how-do-i-change-a-single-value-in-a-data-frame — Dodge, Mar 20 '18 at 17:46
Have you actually read the question? I do not believe OP is trying to change a value of anything,...rather, this seems to be a subsetting question. — pyll, Mar 20 '18 at 17:49
@W.Dodge I am sorry, but I do not think this is a duplicate. — Jatt, Mar 20 '18 at 17:50
maybe it would be helpful if you provided a small sample df and a desired result... — pyll, Mar 20 '18 at 17:52
ok, so in your updated example...your desired output is all 785 rows, but only one column ('V2')...assuming all the other columns do not have a 6 in the 785th row. correct? — pyll, Mar 20 '18 at 18:04
@pyll Yeah. I want all columns where the value in 785th row is `6`. There would be many columns where value in 785th row would be `6` — Jatt, Mar 20 '18 at 18:06

pyll · Accepted Answer · 2018-03-20T19:15:40.747

0

# Sample data looks like this.
# Multiple columns V1-VN (only 4 here for example)

V1 <- c(0, 0, 0, 0, 0, 0, 0)
V2 <- c(0, 0, 0, 0, 0, 0, 6)
V3 <- c(0, 0, 0, 0, 0, 0, 5.5)
V4 <- c(0, 0, 0, 0, 0, 0, 6)
df <- data.frame(V1, V2, V3, V4)

# You wants to subset the data by
# keeping the columns which have specific value (6)
# in a certain row (Row 7 here for simplicity)

newdf <- df[,df[7,]==6]

# So try this for your problem
newmy_data <- my_data[,my_data[785,]==6]

Update: Request for multiple conditions led to this answer

newdf <- df[,df[7,] %in% c(6, 5.5)]

edited Mar 20 '18 at 19:15

answered Mar 20 '18 at 17:55

pyll

1,688
1
26
44

Updated the sample set in question – Jatt Mar 20 '18 at 18:03
I think, that is fine. Could you please explain what exactly have you done and how is it different from what I have tried (as given in the question)? – Jatt Mar 20 '18 at 18:30
Sure. The first thing is actually assigning the new subset to a df, which is what `newdf <-` is doing. Then, it's just a matter of getting the logic right. I am starting with `df`, then I want all rows, but only the columns where my condition is true. – pyll Mar 20 '18 at 18:33
Why is it necessary to wrap `my_data[785,]==6` in `c`? – Jatt Mar 20 '18 at 18:44
Actually...it looks like it's not. I was thinking it had to return a list of columns. but was wrong. Updated answer. – pyll Mar 20 '18 at 18:46
Okay. I had approached it like `my_data[my_data[1:24217][785,] == 6]`. Is it an incorrect way? – Jatt Mar 20 '18 at 18:50
it's not necessary to specify the columns like that. In fact, that seems to be concatenating the columns into a long list. the answer i gave provides the desired format – pyll Mar 20 '18 at 18:55
If I need to also get rows with value 5. Will the following be okay? `my_data[,my_data[785,]==6 || my_data[785,]==5]`? – Jatt Mar 20 '18 at 19:04
No. Try this. `newmy_data <- my_data[,my_data[785,] %in% c(6, 5)]` – pyll Mar 20 '18 at 19:14

Sub-setting csv structure based upon the value in last row

1 Answers1