0

I have a nested data in one long column. The data represents the different items in stock of various stores. Every item has 6 values of No, Item, Expiry Date, Manufacturer, Quantity, Unit Price, Discount and Total Price. The items are listed numerically 1,2,3 and then it starts for a new store 1,2 and then for the next 1,2,3 etc. My goal is to collect all the items of one store in one row and whenever there is a 1 start a new row


1
ABC RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
2
ABC RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
1
BCD
Amp
Apr 2024
XYZ
280
$2.00 
$0 
$ 2 555.0
2
BCD RH --
Amp
Mar 2024
ABC D --
280
$5.20 
$0 
$ 1 456.0
3
BCD RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
1
1. HJK
Pack
Mar 2023
D --
80
$0.20 
$0 
$ 3 456.0
2
2. BCD
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
3
BCD RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
4
BCD RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
5
BCD RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0
6
BCD RH --
Pack
Mar 2022
ABC D --
280
$5.20 
$0 
$ 1 456.0

and here is what I'm trying to display it as

 |No |Item  | Size |Exp     |Manuf |Qty  |Unit |Dis |Total    |No| Item|
 |1  |ABC RH| Pack |Mar 2022| ABC D| 280 |$5.20|$0  |$ 1 456.0|2 | ABC |
 |1  |BCD RH| Amp  |Apr 2024| XYZ  | 280 |$2.00|$0  |$ 2 555.0|2 | BCD |
 |1  |1. HJK| Pack |Mar 2023| D -- | 80  |$0.20|$0  |$ 3 456.0|2.| BCD |

Thank you for your help

Kat
  • 15,669
  • 3
  • 18
  • 51
  • 1
    Can you describe more precisely how you want to the output? Should it be raw text as shown in your example? – s_baldur Oct 20 '21 at 09:05
  • 1
    You could improve your chances of finding help here by adding a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Adding a MRE and an example of the desired output (in code form, not tables and pictures) makes it much easier for others to find and test an answer to your question. That way you can help others to help you! P.S. Here is [a good overview on how to ask a good question](https://stackoverflow.com/help/how-to-ask) – dario Oct 20 '21 at 09:05
  • Thank you for your reply @sindri_baldur, I want to save it as csv. I'm using RStudio – Samia Abdelhafiz Oct 20 '21 at 09:11
  • 1
    Could you explain the logic that connect input to output? – s__ Oct 20 '21 at 09:15
  • The number '1' represents the start of a new row. The numbers 2,3,4 etc represent the elements of the row. Every time there is a 1, every cell following it should go in the same row until it finds another 1, in which case it should start a new row – Samia Abdelhafiz Oct 20 '21 at 09:24
  • You understand that `csv` is meant for tabular data? – s_baldur Oct 20 '21 at 09:42
  • Yes CSV is meant for tabular data, and that is my final goal. To save the data as a table – Samia Abdelhafiz Oct 20 '21 at 09:48
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Oct 20 '21 at 09:53

2 Answers2

1

You can try a tidyverse

library(tidyverse)
as.data.frame(df) %>% 
  mutate(V2 = cumsum(df == "1")) %>% 
  group_by(V2) %>%
  mutate(V4 = 1:n()) %>% 
  pivot_wider(names_from = V4, values_from = df, names_prefix = "Item") %>% 
  ungroup() %>% 
  select(-V2)
  # A tibble: 3 x 9
  Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1     A     B     C     2     A     B     NA    NA   
2 1     A     B     C     D     NA    NA    NA    NA   
3 1     A     2     A     B     C     3     A     B   
Roman
  • 17,008
  • 3
  • 36
  • 49
  • But why is there a "NA" value? I've tried to clarify my question. The data has quantity (A), type (B) and expiry (C) for every item. Sorry for the not providing a workable example data beforehand – Samia Abdelhafiz Oct 20 '21 at 09:50
  • So, what value do you want instead of NA's? You need some value for the empty cells when the output have to be a data.frame/table. – Roman Oct 20 '21 at 09:56
  • Finally Salvation!!!! Thank you for your help. I believe I could remove the NAs afterwards. Sorry to bother you again, while you're at it, how could I rename the column Items using Quantity, Type and Expiry? Which is a going to repeat for every item in the row – Samia Abdelhafiz Oct 20 '21 at 10:03
  • Try to add this `values_fill = ""` to the `pivot_wider` part. I don't understand how you want to rename the columns. Please update your expected output table. – Roman Oct 20 '21 at 10:08
  • I've been trying to update the table but the format is gone. But I hope it gives you an idea about naming the rows. I'm very grateful for your help – Samia Abdelhafiz Oct 20 '21 at 10:34
0

A possible solution using base R:

df=c('1','A','B','C','2','A','B','C','1','A','B','C','1','A','B','C','2','A','B','C','3','A','B','C')
df <- data.frame(V=df)

idx <- which(df$V == 1)
n <- length(idx)

leng <- idx[-1]-idx[-n]
leng <- c(leng,length(df$V)-idx[n]+1)

l <- split(df$V, rep(1:n, leng))
l <- lapply(l, `length<-`, max(lengths(l)))

as.data.frame(do.call(rbind,l))
PaulS
  • 21,159
  • 2
  • 9
  • 26