0

I am using the code below to produce following result in Python and I want equivalent for this code on R. here N is the column of dataframe data . CN column is calculated from values of column N with a specific pattern and it gives me following result in python.


+---+----+
| N | CN |
+---+----+
| 0 |  0 |
| 1 |  1 |
| 1 |  1 |
| 2 |  2 |
| 2 |  2 |
| 0 |  3 |
| 0 |  3 |
| 1 |  4 |
| 1 |  4 |
| 1 |  4 |
| 2 |  5 |
| 2 |  5 |
| 3 |  6 |
| 4 |  7 |
| 0 |  8 |
| 1 |  9 |
| 2 | 10 |
+---+----+

a short overview of my code is

data = pd.read_table(filename,skiprows=15,decimal=',', sep='\t',header=None,names=["Date ","Heure ","temps (s) ","X","Z"," LVDT V(mm) " ,"Force normale (N) ","FT","FN(N) ","TS"," NS(kPa) ","V (mm/min)","Vitesse normale (mm/min)","e (kPa)","k (kPa/mm) " ,"N " ,"Nb cycles normal" ,"Cycles " ,"Etat normal" ,"k imposÈ (kPa/mm)"]) 

data.columns = [col.strip() for col in data.columns.tolist()]    
N = data[data.keys()[15]] 
N = np.array(N) 
data["CN"] = (data.N.shift().bfill() != data.N).astype(int).cumsum()

an example of data.head() is here


+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
| Index |    Date     |   Heure    | temps (s) |    X     |  Z(mm)   | LVDT V(mm) | Force normale (N) |    FT     |    FN(N)    | FT (kPa)  |  NS(kPa)   | V (mm/min) | Vitesse normale (mm/min) |  e (kPa)   | k (kPa/mm) |  N  | Nb cycles normal | Cycles | Etat normal | k imposÈ (kPa/mm) | CN |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
|   184 | 01/02/2022  |  12:36:52  |   402.163 | 6.910243 | 1.204797 | 0.001101   |        299.783665 | 31.494351 | 1428.988908 | 11.188704 | 505.825016 |        0.1 |                      2.0 | 512.438828 |  50.918786 | 0.0 |              0.0 |  Sort  |  Monte      |               0.0 |  0 |
|   185 | 01/02/2022  |  12:36:54  |   404.288 | 6.907822 | 1.205647 | 4.9e-05    |        296.072718 | 31.162313 | 1404.195316 | 11.028167 |  494.97955 |        0.1 |                     -2.0 | 500.084986 |  49.685639 | 0.0 |              0.0 |  Sort  |  Descend    |               0.0 |  0 |
|   186 | 01/02/2022  |  12:36:56  |   406.536 | 6.907906 | 1.204194 | -0.000214  |        300.231424 | 31.586401 | 1429.123486 |  11.21895 | 505.750815 |        0.1 |                      2.0 | 512.370164 |  50.914002 | 0.0 |              0.0 |  Sort  |  Monte      |               0.0 |  0 |
|   187 | 01/02/2022  |  12:36:58  |   408.627 | 6.910751 | 1.204293 | -0.000608  |        300.188686 | 31.754064 | 1428.979519 | 11.244542 | 505.624564 |        0.1 |                      2.0 | 512.309254 |  50.906544 | 0.0 |              0.0 |  Sort  |  Monte      |               0.0 |  0 |
|   188 | 01/02/2022  |  12:37:00  |   410.679 | 6.907805 | 1.205854 | -0.000181  |        296.358074 | 31.563389 | 1415.224427 | 11.129375 | 502.464948 |        0.1 |                      2.0 | 510.702313 |  50.742104 | 0.0 |              0.0 |  Sort  |  Monte      |               0.0 |  0 |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
shah nawaz
  • 15
  • 4
  • What have you tried in R and where have you run into trouble? Do you have sample data to share for R? Best if you can provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) highlighting what you've tried, where you are stuck, and how we can help. – caldwellst Feb 14 '22 at 10:48
  • Welcome to SO! You maximise your chance of getting a useful answer if you provide a minimal reproducible example. [This post](https://stackoverflow.com/help/minimal-reproducible-example) may help. Here. you need to specify more precisely what your "specific pattern" is. Looking at your data I *think* you might be looking for some run length encoding: take a look at the online help for `rle()`. – Limey Feb 14 '22 at 10:49
  • I have not treid anythng on R till now because I am new to R. Just 1 week till now. Actually I wanted to import my python code to R but this code is under a function and R's py_to_r from reticulate package is not giving me variables under the function. – shah nawaz Feb 14 '22 at 11:15
  • Please don't post data like [that](https://stackoverflow.com/questions/52023709/what-can-r-do-about-a-messy-data-format). – Rui Barradas Feb 14 '22 at 11:38

1 Answers1

0

A one line cumsum trick solves it.

cumsum(c(0L, diff(df1$N) != 0))
#>  [1]  0  1  1  2  2  3  3  4  4  4  5  5  6  7  8  9 10

all.equal(
  cumsum(c(0L, diff(df1$N) != 0)),
  df1$CN
)
#> [1] TRUE

Created on 2022-02-14 by the reprex package (v2.0.1)

Data

x <- "
+---+----+
| N | CN |
+---+----+
| 0 |  0 |
| 1 |  1 |
| 1 |  1 |
| 2 |  2 |
| 2 |  2 |
| 0 |  3 |
| 0 |  3 |
| 1 |  4 |
| 1 |  4 |
| 1 |  4 |
| 2 |  5 |
| 2 |  5 |
| 3 |  6 |
| 4 |  7 |
| 0 |  8 |
| 1 |  9 |
| 2 | 10 |
+---+----+"
df1 <- read.table(textConnection(x), header = TRUE, sep = "|", comment.char = "+")[2:3]

Created on 2022-02-14 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks for the code but I am not getting column CN as an additional column in dataframe by using above code. How can I do that? – shah nawaz Feb 14 '22 at 14:13
  • problem solved. Actually the first of code was doing the actual work. I don't know what the lower line doing if you can explain. I put first line equal to a variable and added it as a column in dataframe . Thanks – shah nawaz Feb 14 '22 at 14:16
  • @shahnawaz The second line is checking if your expected output is equal to my coder's output, nothing more. It's the first line that you want. – Rui Barradas Feb 14 '22 at 14:59