1

I would like to find the number of unique company name from data frame:

/organization/-fame
/ORGANIZATION/-QOUNTER
/organization/-qounter
/ORGANIZATION/-THE-ONE-OF-THEM-INC-
/ORGANIZATION/0NDINE-BIOMEDICAL-INC
/organization/0ndine-biomedical-inc

I have separated the company name above using the split function,

split_prod <- str_split_fixed(rounds2$company_permalink,"/", 4)

and converted into a new data frame:

companyname <- data.frame(split_prod, stringsAsFactors = FALSE)

I got the output in four columns as mentioned below:

    X1     X2                     X3                   X4
        organization        -fame
        ORGANIZATION        -QOUNTER
        organization        -qounter
        ORGANIZATION        -THE-ONE-OF-THEM-INC-
        organization        0-6-com
        ORGANIZATION        004-TECHNOLOGIES
        organization        01games-technology
        ORGANIZATION        0NDINE-BIOMEDICAL-INC
        organization        0ndine-biomedical-inc

How can I calculate the number of unique company name now? I have tried:

    `distinct(rounds$X3)`  ----- not working
    `length(unique(rounds$X3)` --- wrong output number i m getting.

Please help. Also, I m not sure the way I used the split function is correct or not. Particularity I m concerning about the number "4". I have calculated this number as slash, organization, company name, slash so tried to separate into four columns.

MLavoie
  • 9,671
  • 41
  • 36
  • 56
Krishna
  • 53
  • 6

2 Answers2

0

The code:

length(unique(tolower(companyname$X3)))

Will return the number of unique company in the X3 column of your companyname dataframe.

onlyphantom
  • 8,606
  • 4
  • 44
  • 58
  • i have tried but I m not getting the right number. I compared the number directly in excel after manipulating. – Krishna Apr 30 '18 at 10:52
  • Try: `length(unique(tolower(companyname$X3)))` if you count 0NDINE-BIOMEDICAL-INC and 0ndine-biomedical-inc as the same company – onlyphantom Apr 30 '18 at 10:54
  • Converts it to lowercase so `0NDINE-BIOMEDICAL-INC` and `0ndine-biomedical-inc` are treated as the same company – onlyphantom Apr 30 '18 at 10:59
0

Either use tolower or toupper or str_to_lower / str_to_upper if you are using the stringr package. Otherwise -QOUNTER and -qounter will be counted twice.

Full example:

library(stringr)
text <- c("/organization/-fame",
          "/ORGANIZATION/-QOUNTER",
          "/organization/-qounter",
          "/ORGANIZATION/-THE-ONE-OF-THEM-INC-",
          "/ORGANIZATION/0NDINE-BIOMEDICAL-INC",
          "/organization/0ndine-biomedical-inc")

split_prod <- str_split_fixed(text,"/", 4)

companyname <- data.frame(split_prod, stringsAsFactors = FALSE) 
str(companyname) 
head(companyname) 
length(unique(tolower(companyname$X3))) 
[1] 4

Column X4 is created because you specify 4 in your str_split_fixed.

phiver
  • 23,048
  • 14
  • 44
  • 56
  • Thank you! how tolower does the magic? what is the use of tolower? – Krishna Apr 30 '18 at 10:58
  • tolower puts all the characters in lowercase. So "This Text" will become "this text". – phiver Apr 30 '18 at 10:59
  • great! how about "this text" and "-this text" or ":thistext" or anything begin with special character. Will it be counted as twice or once? – Krishna Apr 30 '18 at 11:18
  • @Krishna, twice. They are not the same. If you want to remove special characters you will need to do some regex. str_remove will help, and as a pattern you need all to specify the special chars. If you need help with that, open a new question. There are loads of regex experts on SO – phiver Apr 30 '18 at 11:34
  • Thank you for your help! Now, I would like to merge 2 data frames. 1 dataframe has 115000 rows and 8 columns and the other dataframe has 67000 rows and 6 columns. Each dataframe has 1 unique key but different column name. Is it the right method to change 1 dataframe column name to match with other data frame column name and merge? If yes, what is the code. – Krishna May 01 '18 at 14:37
  • @Krishna, `?merge` will give you the answers you need. If you need help with merging data.frames search SO for merge. Lots of questions, biggest answer is [this one](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right?s=1|131.7356). – phiver May 01 '18 at 15:01
  • Thank you. I have checked and tried, merg<-merge(x=companies, y=rounds2, by.x="permalink", by.y="company_permalink") but only column names are merging and not getting any values in any column. Any idea? – Krishna May 01 '18 at 16:19
  • merg<-merge(companies, rounds2, by.companies="permalink", by.rounds2="company_permalink") Error: cannot allocate vector of size 28.4 Gb – Krishna May 01 '18 at 16:26