0

I'm new in programming in R, and I've been having this problem for several days now. I started with a list, I created from splitting a file. This list contains a number of items in a single row.

head(sales2)
> $`7143443`
>>[1] "SSS-U-CCXVCSS1"   "L-CCX-8GETTS-LIC"

>$`7208993`
>>[1] "NFFGSR4="  "1MV-FT-1="  "VI-NT/TE=" 

>$`7241758`
>>[1] "PW_SQSGG="

>$`9273628`
>>[1] "O1941-SE9" "CCO887VA-K9"    "2901-SEC/K9" "CO1941-C/K9"

>$`9371709`
>>[1] "HGR__SASS=" "WWQTTB0S-L" "WS-RRRT48FP" "WTTTF24PS-L" 
[5] "GEDQTT8TS-L"  "WD-SRNS-2S-L" 

>$`9830473`
>>[1] "SPA$FFSB0S"

I wanted it to convert into a data frame , I used

x<-do.call(rbind, lapply(sales2,data.frame))

It gets converted in the data frame ,but it converts like this

> head(x,6)

                          id
> 7143443.1   "SSS-U-CCXVCSS1"

> 7143443.2   "L-CCX-8GETTS-LIC"

> 7208993.1    "NFFGSR4="  

> 7208993.2    "1MV-FT-1="  

> 7208993.3    "VI-NT/TE="

> 7241758      "PW_SQSGG="

I want 7143443's all item in a single row not in multiple row

Through this I want to calculate how many rows contain 2 items together for example "WS-C2960S-48TS-L" , "WS-C2960S-24TS-L", these 2 elements are there in how many rows? You can also say probability of these over all other elements.

Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
Tapan
  • 3
  • 2
  • 2
    Please give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos Apr 18 '17 at 13:54
  • One way could be: http://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows But first you have to collaps each character vector to a string. – jogo Apr 18 '17 at 13:57
  • 1
    However, this might get you started, `do.call(rbind, lapply(sales2, toString))` which is pretty much what @jogo is suggesting – Sotos Apr 18 '17 at 13:59
  • Hi Sotos , Final output should be like Id 7143443 product "SSS-U-CCXVCSS1" , "L-CCX-8GETTS-LIC" – Tapan Apr 18 '17 at 17:05
  • Hi @jogo ** do.call(rbind, lapply(sales2, toString)) ** it didn't work – Tapan Apr 18 '17 at 18:09
  • 1
    `newdf <- data.frame(n=names(sales2), s=sapply(sales2, paste0, collapse = ';'), stringsAsFactors = FALSE)` – jogo Apr 18 '17 at 18:15
  • Thank you@ jogo for your help . I have another issue I want to find which two products are in single row . When i try to do it with s2<- nrow (subset(newdf, s == "CISCO2901-V/K9" & s == "WS-C2960-48TC-L" ) . the system only counts when they are together, if they are with another item ,it neglects that row . My main purpose is to count the no of rows having these two together. Can you pls help me with that – Tapan Apr 19 '17 at 14:26

1 Answers1

1

I am not sure what is your final desired output. But the following script can convert your list to a data frame. Perhaps you can begin your analysis from this data frame.

# Create example list
sales2 <- list(`7143443` = c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC"),
            `7208993` = c("NFFGSR4=", "1MV-FT-1=", "VI-NT/TE="),
            `7241758` = "PW_SQSGG=",
            `9273628` = c("O1941-SE9", "CCO887VA-K9", "2901-SEC/K9", "CO1941-C/K9"),
            `9371709` = c("HGR__SASS=", "WWQTTB0S-L", "WS-RRRT48FP", "WTTTF24PS-L",
                          "GEDQTT8TS-L", "WD-SRNS-2S-L"),
            `9830473` = "SPA$FFSB0S")

# Load packages
library(dplyr)
library(purrr)

dat <- map(sales2, data_frame) %>%                    # Convert each list element to a data frame
  bind_rows(.id = "ID") %>%                           # Combine all data frame
  rename(Value = `.x[[i]]`) %>%                       # Change the name of the second column
  group_by(ID) %>%                                    # Group by the first column
  summarise(Value = paste0(Value, collapse = " "))    # Collapse the second column

dat
# A tibble: 6 × 2
       ID                                                                  Value
    <chr>                                                                  <chr>
1 7143443                                        SSS-U-CCXVCSS1 L-CCX-8GETTS-LIC
2 7208993                                           NFFGSR4= 1MV-FT-1= VI-NT/TE=
3 7241758                                                              PW_SQSGG=
4 9273628                          O1941-SE9 CCO887VA-K9 2901-SEC/K9 CO1941-C/K9
5 9371709 HGR__SASS= WWQTTB0S-L WS-RRRT48FP WTTTF24PS-L GEDQTT8TS-L WD-SRNS-2S-L
6 9830473                                                             SPA$FFSB0S

Update

After reading original poster's comment, I decided to update my solution, to count how many rows contain two specified string patterns.

Here one row is a unique ID. So I assume that the request can be rephrased to "How many IDs contain two specified string patterns?" If this is the case, I would prefer not to collapse all the observations. Because after collapsing all observations to from one ID per row, we need to develop a strategy to match the string, such as using the regular expression. I am not familiar with regular string, so I will leave this for others to provide solutions.

In addition, the original poster did not specify which two strings are the targeted, so I would develop a strategy that the users can replace the targeted string case by case.

dat <- map(sales2, data_frame) %>%                    # Convert each list element to a data frame
  bind_rows(.id = "ID") %>%                           # Combine all data frame
  rename(Value = `.x[[i]]`)                           # Change the name of the second column

# After this, there is no need to collapse the rows

# Set the target string, User can change the strings here
target_string1 <- c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC")       

dat2 <- dat %>%
  filter(Value %in% target_string1) %>%               # Filter rows matching the targeted string
  distinct(ID, Value, .keep_all = TRUE) %>%           # Only keep one row if ID and Value have exact duplicated
  count(ID) %>%                                       # Count how many rows per ID
  filter(n > 1) %>%                                   # Keep only ID that the Count number is larger than 1
  select(ID)

dat2

# A tibble: 1 × 1
       ID
    <chr>
1 7143443
www
  • 38,575
  • 12
  • 48
  • 84
  • Hey @ycw , thanks for your answer but it is giving me a error Error: Arguments to `rename()` must be unquoted variable names. Argument `Value` is not. can you help me with that – Tapan Apr 18 '17 at 18:01
  • What is the column name of your second column after you used `bind_rows`? – www Apr 18 '17 at 18:12
  • Did you add `` to enclose the second column name? – www Apr 18 '17 at 18:23
  • ohh i didn't enclose it with `` . Thank you ycw. Another issue, when i am subsetting using `code` **s2<- nrow (subset(dat, Value == "CISCO2901-V/K9" & Value == "WS-C2960-48TC-L" )) ** it is not giving correct result . it is giving 0 but it should give around 40. Can you help me with that – Tapan Apr 19 '17 at 07:51
  • hello @ycw , thank so much for update , but I am getting dat2 without any value .I am taking id -> 9273628` = "O1941-SE9", "CCO887VA-K9", saving target_string1 <-c("O1941-SE9", "CCO887VA-K9") . Copying the similar code with same dataframe, Is I am missing on something. And if I take single element in the target string , i am getting again empty tibble . please help me out !! – Tapan Apr 19 '17 at 18:31
  • @Tapan Please run the code line by line and observe when does the `dat2` become empty. After `filter`, the row numbers should be 2, After `distince`, the row number is still 2. After `count`, the row number becomes 1, and it remains to be 1 until the end. This is my observation when I set `target_string1 <-c("O1941-SE9", "CCO887VA-K9")` – www Apr 19 '17 at 18:51