0

There might be a simple solution to this but I'm struggling.

I have a code as follows:

for(i in 1:nrow(df)){
x[i] <- df[i,]$X4
if(length(unique(df[1:i,]$X4)) == length(unique(df$X4))){
break
}
collect <- data.frame(df[1,]$X1, df[1,]$X2, df[i+1,]$X3)
}

The loop breaks after the if condition length(unique(df[1:i,]$X4)) == length(unique(df$X4)) is reached. However, I want to start the same loop again from i+1'th iteration, and keep checking until the same if condition is met again, till the end of my dataframe.

My sample data is as follows:

1       930000  1000000 E2-A
1       1890000 2110000 E2-A
1       2120000 2330000 D
1       2340000 3350000 E2-B
1       3365000 3405000 B
1       5695000 5810000 E2-A
1       6305000 6405000 E2-B
1       6425000 6465000 E1-A
1       6780000 6960000 E2-B
1       7100000 7270000 D
1       7730000 7810000 D 
1       8030000 8380000 E2-A
1       8970000 9170000 E1-A
1       9345000 9555000 E1-B
1       9845000 9930000 E1-A
1       10000000        10100000        E1-B
1       10430000        10560000        E3
1       11720000        11780000        B
1       11900000        11960000        C
1       12185000        12270000        E1-A
1       12450000        12680000        A  #break point of loop because if(length(unique(df[1:i,]$X4)) == length(unique(df$X4)))
1       13990000        14290000        B
1       15250000        15355000        E2-B
1       15475000        15600000        D
1       15655000        15755000        A
1       15920000        16080000        E2-A
1       16120000        16280000        C
1       16400000        16570000        E1-B
1       17280000        17380000        E1-B
1       17450000        17735000        A
1       17760000        17820000        E1-B
1       17825000        17935000        A
1       18925000        19150000        E1-A
1       19220000        19410000        C
1       19680000        19980000        C
1       20230000        20820000        E3 #the if condition is met again after the break, but using break exits the loop
1       20845000        20970000        E2-A
1       21580000        21695000        D
1       21700000        21920000        E2-A
1       22430000        22750000        B
1       22740000        22980000        A
1       23300000        23515000        C
1       23870000        23965000        A
1       24525000        24720000        E2-B
1       25010000        25160000        D
1       25170000        25430000        B
1       25930000        26130000        A
1       26220000        26330000        E2-B
1       26435000        26485000        C

My expected output is:

1       930000        12680000        
1       13990000      20820000

But what I get so far is:

1       930000        12680000        

How do I do so?

rishi
  • 267
  • 1
  • 9
  • try `next` instead of `break` – Prem May 05 '18 at 08:30
  • @Prem, doesn't work. `next` would skip my `if condition`. – rishi May 05 '18 at 08:34
  • 2
    You may want to show us the reproducible example along with the desired output. – Prem May 05 '18 at 08:35
  • @Prem added the link to my old question, which has the example and the desired output :) – rishi May 05 '18 at 08:41
  • 2
    Not only it would help you, but also us, to provide prompt and quality help if you provided a copy/pastable self contained example. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for many examples. – Roman Luštrik May 05 '18 at 08:47
  • @RomanLuštrik yes ofc, I understand :) I edited my question and added a hyperlink to my old questio, with the example and desired output! – rishi May 05 '18 at 08:50
  • 3
    Note also that I mentioned copy/pastable, which means we copy & paste the example into our R session and we're on our way. Your example involves a lot of preprocessing just to get the data in. Some people might be dissuaded by that. – Roman Luštrik May 05 '18 at 08:53
  • @RomanLuštrik you're absolutely right. Apologies for not making it clear in the first place. I'll make sure from now on if I post a question, I provide the easiest sample data to work on. Thanks for the constructive criticism! :) – rishi May 05 '18 at 12:22

2 Answers2

0
# I saved the data you provided into a file and read them back into R session
df <- read.table("df.txt",quote="#")

# It looks like you are using X1, X2, and so on in your code example
# so I renamed the column names
names(df) <- c("X1","X2","X3","X4")

# check the structure of the data frame
    str(df)
# 'data.frame': 49 obs. of  4 variables:
#   $ X1: int  1 1 1 1 1 1 1 1 1 1 ...
# $ X2: int  930000 1890000 2120000 2340000 3365000 5695000 6305000 6425000 6780000 7100000 ...
# $ X3: int  1000000 2110000 2330000 3350000 3405000 5810000 6405000 6465000 6960000 7270000 ...
# $ X4: Factor w/ 9 levels "A","B","C","D",..: 7 7 4 8 2 7 8 5 8 4 ...

result <- list()
i.new = 1
j = 0
# number of unique values in the 4th column
n.unique <- length(unique(df$X4))
for ( i in seq(nrow(df) )) {

  if(length(unique(df[i.new : i,"X4" ])) == n.unique ){
    j = j+1
    result[[j]] <- c( df[i.new, 2], df[i, 3])
    i.new = i + 1
  }
}

result
# [[1]]
# [1]   930000 12680000
# 
# [[2]]
# [1] 13990000 20820000



# If Dataframe is needed:
do.call(rbind.data.frame, result)
#c.930000L..13990000L. c.12680000L..20820000L.
#1                930000                12680000
#2              13990000                20820000

#If matrix is OK
matrix(unlist(result, use.names=F), ncol = 2, byrow = TRUE)
#         [,1]     [,2]
#[1,]   930000 12680000
#[2,] 13990000 20820000
Katia
  • 3,784
  • 1
  • 14
  • 27
  • `> do.call(rbind.data.frame, result) data frame with 0 columns and 0 rows` :( Doesn't work – rishi May 06 '18 at 09:50
  • It looks like your initial variable has different structure. I added at the beginning of my code how I arrived at my data. Compare the output of str() function between my data and yours – Katia May 06 '18 at 10:56
  • My first column was characters, so I changed them to interger, as yours. And I changed my last column to factors as well, to match yours. Yet the error remains. Should we take this discussion to chat? – rishi May 06 '18 at 13:09
  • Sure, We can work through chat. I have to do some errands right now, but will answer you in chat later. – Katia May 06 '18 at 13:16
  • I don't have 100 reputation haha, so I can't create a chatroom. – rishi May 06 '18 at 13:26
  • What I would do is to create a new question. Provide a simple dataset (just a few rows) with only those columns that you need to answer your new question. For simplicity I would use letters like A, B, C to mark your source and destination. and then formulate a question and give an output that you would like to see for the input you provided. Also assume that the original problem is solved and so give the input for this new question. – Katia May 06 '18 at 13:31
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/170531/discussion-between-rishi-and-katia). – rishi May 07 '18 at 11:38
0
ulist<- unique(df$X4)
uniq <- length(unique(df$X4))
brk <- 0
rn <- length(df$X1)
coor <- NULL
while (brk < rn) {

found <- rep(0,uniq)
coor.s <- 0
coor.e <- 0

coor.s <- df$X2[brk+1]
for (i in (brk+1):rn) {
for (j in 1:uniq) {
  if(df$X4[i] == ulist[j]) {found[j]<-1}
}
if (sum(found)==uniq) {coor.e <- df$X3[i];brk=i;break}

}

if(sum(found)<uniq) {
break
} else {
collect.df <- as.data.frame(rbind(coor,c(coor.s,coor.e)))
}
}
rishi
  • 267
  • 1
  • 9