0

Suppose I have a column in a dataframe with strings. I want to create a grouping technique so that the length of the string is matched and then the character of the string is also matched to acknowledge it as a specific group.

The output should be grouped like the below provided sample:

Rule                      Group
x                           1
x                           1
xx                          2
xx                          2
xy                          3
yx                          3
xx                          2
xyx                         4
yxx                         4
yyy                         5
xyxy                        6   
yxyx                        6
xyxy                        6
Sotos
  • 51,121
  • 6
  • 32
  • 66
NiMbuS
  • 87
  • 2
  • 9
  • I have been able to derive a function to provide me with desired output in Python.But I am unable to get desired output with R Programming. – NiMbuS Apr 18 '19 at 09:35
  • Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and reformat your quesion accordingly. We can't help you without a clear description of your problem and knowing what you have done. – Julian_Hn Apr 18 '19 at 09:38
  • Suppose the column in the data-frame is similar to column "Rule" mentioned in the sample.I want to group the column based on the string length and string characters – NiMbuS Apr 18 '19 at 09:53

1 Answers1

2

You can split the Rule, sort and paste back together. Matching the result with the unique result will then give you what you need. In R,

v1 <- sapply(strsplit(df$Rule, ''), function(i)paste(sort(i), collapse = ''))
match(v1, unique(v1))
#[1] 1 1 2 2 3 3 2 4 4 5 6 6 6
Sotos
  • 51,121
  • 6
  • 32
  • 66