0

I would like to find a specific value in a column of a data frame and then replaced by whatever I want.

For example, I have a data frame with the name of cities (column 1) and the frequency (column 2). But some of the cities have different district thus R see them like different cities because they have not the same row name.

Example:

--> I have:

      City     Freq
1    Paris 01   69
2    Paris 03   60
3    Paris 15   12
4    Paris 20   2
5    Toulouse   60
6    Paris      15
7    Lille      12

--> I would like:

      City Freq
1    Paris   69
2 Toulouse   60
3    Lille   12

I tried to use the gsub function but I don't know how to deal with it. Also I tried some if statement but I failed. I really tried to find some answers before posting something but the examples I have found are simplier and involved only the change of all the column (etc...).

Thank you for helping me!

Here some informations about my data:

dput(droplevels(head(data))) 

structure(list(City = structure(c(1L, 4L, 3L, 5L, 2L, 6L), .Label = c("PARIS", "PARIS 13", "PARIS 15", "PARIS 16", "PARIS 18", "PARIS 20"), class = "factor"), Freq = c(8859L, 3843L, 3583L, 2651L, 2586L, 2464L)), .Names = c("City", "Freq"), row.names = c(19380L, 19396L, 19395L, 19398L, 19393L, 19400L), class = "data.frame")
Marie
  • 127
  • 1
  • 9
  • Based on your new `data` dput output, it should be a single row aggregate output and I get `aggregate(Freq~City, transform(data, City=tolower(sub("\\s+.*$", '', City))), FUN=sum)# City Freq 1 paris 23986` – akrun Jun 22 '15 at 14:27
  • Check if you have any non-ASCII characters. You can also look [here](http://stackoverflow.com/questions/4993837/r-invalid-multibyte-string) – akrun Jun 22 '15 at 14:30
  • Thank you, I had a problem with the non-ASCII characters! Now it is working perfectly! – Marie Jun 22 '15 at 14:43

1 Answers1

2

You can modify the 'City' column using sub

df2 <- transform(df1, City=tolower(sub("\\s+.*$", '', City)))
res <- aggregate(Freq~City,df2, FUN=sum)
res
#     City Freq
#1    lille   12
#2    paris   69
#3 toulouse   60

res$City <- sprintf('%s%s', toupper(substr(res$City,1,1)),
                 sub('^.', '', res$City))

data

df1 <- structure(list(City = structure(c(3L, 4L, 5L, 6L, 7L, 2L, 1L), 
.Label = c("Lille", 
"Paris", "Paris 01", "Paris 03", "Paris 15", "PARIS 20", "Toulouse"
), class = "factor"), Freq = c(12, 15, 25, 2, 60, 15, 12)),
.Names =    c("City", 
"Freq"), row.names = c(NA, -7L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I don't understand why your code is not working with my dataframe. I have the following error: Error in tolower(sub("\\s+.*$", "", City)) : invalid multibyte string 17715. What I am asking for is exactly the answer you gave me. – Marie Jun 22 '15 at 14:01
  • @Marie My code is reproducible based on the data I showed (which is the dput output). Your data structure is not known. It is better to show dput output – akrun Jun 22 '15 at 14:03
  • Ok, sorry. I am going to edit my post and show you my data (or at least head of the data). – Marie Jun 22 '15 at 14:05
  • @Marie You can show the output of `dput(droplevels(head(yourdata)))` – akrun Jun 22 '15 at 14:06