I have a data set that includes DMA (Designated Market Area), but many of the DMAs show up as two different levels because the DMA is truncated, e.g. the DMA "Abilene-Sweetwater, TX" sometimes shows up as "Abilene-Sweetw"
Here is a snippet of the data set:
dma <- c("Abilene-Sweetw", "Abilene-Sweetwater, TX",
"Albany, GA", "Albany, GA",
"Albany-Schenec", "Albany-Schenec",
"Albany-Schenectady-Troy, NY", "Albany-Schenectady-Troy, NY")
cost <- c(0.46, 0.46, 0.45, 0.45, 0.32, 0.32, 0.32, 0.32)
DMA.df <- data.frame(dma, cost)
DMA.df
dma cost
1 Abilene-Sweetw 0.46
2 Abilene-Sweetwater, TX 0.46
3 Albany, GA 0.45
4 Albany, GA 0.45
5 Albany-Schenec 0.32
6 Albany-Schenec 0.32
7 Albany-Schenectady-Troy, NY 0.32
8 Albany-Schenectady-Troy, NY 0.32
Searches on SO and elsewhere turn up solutions that show how to manually combine multiple factor levels into one. Obviously I don't want to manually do this.
I am looking for a way to fix the truncated DMA and turn it into the "full" DMA (city-...-, state). One saving grace is that there is a pattern to the truncation - it cuts off at 14 letters. The solution would need to match all 14 characters because many DMAs start with the same name (e.g. "Albany, GA" and "Albany-..., NY").
Put another way, I need to find all the truncated DMAs that match the complete DMAs and turn the truncated DMA into the complete DMA.
Sample DF should look like this:
dma cost
1 Abilene-Sweetwater, TX 0.46
2 Abilene-Sweetwater, TX 0.46
3 Albany, GA 0.45
4 Albany, GA 0.45
5 Albany-Schenectady-Troy, NY 0.32
6 Albany-Schenectady-Troy, NY 0.32
7 Albany-Schenectady-Troy, NY 0.32
8 Albany-Schenectady-Troy, NY 0.32
Thanks in advance for any suggestions.