4

I have a Google Spreadsheet with thousands of cells with each cell being populated with strings with many different emojis.

Example of entries:

"Lol ","Haha ","Fire ","Awesome!","Nice See you tomorrow!", "こんにちは", "你好"

But I want to delete all of the emojis, is there a search function I can run/piece of Spreadsheet code I can run to make the document devoid of emojis?

kentobento
  • 55
  • 1
  • 7

4 Answers4

4

Replace emojis from text

I've found, you may use a REGEXREPLACE for that.

To replace all emojis from [A1] please try:

=REGEXREPLACE($A$1,"[©®‼⁉™ℹ↔-↙↩-↪⌚-⌛⌨⏏⏩-⏳⏸-⏺Ⓜ▪-▫▶◀◻-◾☀-☄☎☑☔-☕☘☝☠☢-☣☦☪☮-☯☸-☺♀♂♈-♓♟-♠♣♥-♦♨♻♾-♿⚒-⚗⚙⚛-⚜⚠-⚡⚧⚪-⚫⚰-⚱⚽-⚾⛄-⛅⛈⛎-⛏⛑⛓-⛔⛩-⛪⛰-⛵⛷-⛺⛽✂✅✈-✍✏✒✔✖✝✡✨✳-✴❄❇❌❎❓-❕❗❣-❤➕-➗➡➰➿⤴-⤵⬅-⬇⬛-⬜⭐⭕〰〽㊗㊙-----------------------------------------------#️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣]","")

I believe this regex will find all current emojis from your text.

Notes:

  • some emojis are compound for instance, an astronaut is ‍. Regex needs to find only solid chars, so all compound emojis will be included.

  • I've tried to shorten the solution, and used actual emojis in RegEx. You may also see more "computer-like" solutions: [\u1F60-\u1F64]|[\u2702-\u27B0].... Those solutions use codes of emojis instead.

  • Another interesting option is given here. Remove all not printable chars: =REGEXREPLACE(A1,"[[:print:]]","")

  • skins are included:

enter image description here

Max Makhrov
  • 17,309
  • 5
  • 55
  • 81
3

Cleaning Up with Regular Expressions

I don't have the time to do the whole thing but this will give you a start. I cleaned everything in one cell with this.

var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w]/g,'');//replace everything that's not whitespace or word characters with null
  sht.getActiveCell().setValue(cleantext);

I used the line you provided as test data. Admittedly it needs a little tweaking because it's getting rid of some punctuation.

This is a little better.

function test()
{
  var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w"!,]/g,'');//added "!,
  sht.getActiveCell().setValue(cleantext);

}

So as you run it you may want to add a few more characters to don't replace list. That's it.

I have an expense report that I use to collect my expenses in different categories and I like to produce pie charts to help me get a big picture view of where my money is going. I use this Array Formula to help me gather the information into useful categories for me.

=ArrayFormula(IF(Row(C:C)=1,"Title",IF(LEN(C:C),IF(REGEXMATCH(C:C,"(?i)(string1|string2|string3|string4)"),D:D,""),)))

The regular expression provides an or function for adding additional matching for unexpected item appearing on my expense lists that I want to gather into these categories. If you need another matching term you just go into that formula and add another term as shown below

(string1|string2|string3|string4||string5)

The strings are replaced with real terms with no quotes unless they have quotes around them in the search target.

Cooper
  • 59,616
  • 6
  • 23
  • 54
  • 1
    So to avoid clearing Japanese punctuation marks (。and、) and question I wrote: `var cleantext = text.replace(/[^\s\w"!,。、.?]g,' ');` and it worked perfectly! – kentobento Mar 10 '17 at 01:37
  • That's great. There's nothing really regular about regular expressions but they're hard to beat when it comes to finding stuff in text. And in this case it's probably easier to figure out what you want to keep and then delete the rest. Here's a link you might like to take a look at [check it out](https://github.com/google/re2/blob/master/doc/syntax.txt). – Cooper Mar 10 '17 at 02:22
  • I have a question, in for `var cleantext = text.replace(/[^\s\w"!,]/g,'');` I'm not certain how that gets rid of emojis. The text is being passed on the the replace function, and I'd understand if all the things within the brackets got deleted, but it's the opposite that's happening here. – kentobento Mar 10 '17 at 04:01
  • 1
    Well the idea is that you will replace everything except those things that are inside the set because the set starts with a negate operator so it's saying replace everything that's not in this set. \s is white space \w is word characters and I had ! and , and " and it looks like you added and few more things. – Cooper Mar 10 '17 at 04:34
  • Hey Cooper, I asked another question in a different thread, but I was wondering if you can help concerning the specific code you wrote. How would I leave [ and ] (square brackets) within the text? I've tried escaping with backspace, double backspace, as well as wrapping those in quotes but to no avail. – kentobento Mar 14 '17 at 21:42
  • If you're referring to these square brackets "[^\s\w"!,]" they are just delimiters for a regular expression set. If you wanted them to not be removed you could add them like this "[^\s\w"!\\[\\],]" – Cooper Mar 14 '17 at 23:22
2

Here is some code that goes through one column of data and removes emojis from each cell.

You must replace Your Sheet Tab Name with the sheet tab name that the code should work on. This code currently only processes one column of data. The entire column of values is written back to the sheet in one write operation. Any character codes that are 5 characters or more are assumed to be emojis.

Test it on a few rows of data first.

function killEmojies() {
  var arrayThisRow,columnOfValues,columnToRemoveEmojiesFrom,firstTwoChar,
      i,innerArray,j,L,newCellContent,outerArray,
      ss,sh,
      targetSheet,thisCell,thisCellChar,thisCellVal,thisCharCode,thisCharCodeLength;

  columnToRemoveEmojiesFrom = 1;
  outerArray = [];

  ss = SpreadsheetApp.getActiveSpreadsheet()
  sh = ss.getSheetByName("Your Sheet Tab Name Here");
  targetSheet = ss.getSheetByName("Your Sheet Tab Name Here");

  columnOfValues = sh.getRange(1, columnToRemoveEmojiesFrom,sh.getLastRow(),1).getValues(); 
  L = columnOfValues.length;
  Logger.log('L: ' + L);

  for (i=0;i<L;i++) {

    thisCell = columnOfValues[i];//Get inner array
    thisCellVal = thisCell[0];//Get first element of inner array

    Logger.log(thisCellVal)
    Logger.log('typeof thisCellVal: ' + typeof thisCellVal)

    newCellContent = "";//Reset for every cell
    innerArray = [];//Reset for every row loop

    if (typeof thisCellVal !== 'string') {//This spreadsheet cell contains something
      //other than text
      innerArray.push(thisCellVal);
    } else {
      for (j=0;j<thisCellVal.length;j++) {//Loop through every character in the cell
        thisCellChar = thisCellVal[j];
        thisCharCode = thisCellChar.charCodeAt(0);//Character code of this character
        thisCharCodeLength = thisCharCode.toString().length;

        Logger.log('typeof thisCharCodeLength: ' + typeof thisCharCodeLength);
        Logger.log('this val: ' + thisCharCode);
        Logger.log('thisCharCodeLength: ' + thisCharCodeLength);
        Logger.log(thisCharCodeLength < 5);

        if (thisCharCodeLength === 5) {
          firstTwoChar = thisCharCode.toString().slice(0,2);
          Logger.log('firstTwoChar: ' + firstTwoChar)

        }

        if (thisCharCodeLength > 4 && (firstTwoChar === "54" || firstTwoChar === "55" || firstTwoChar === "56")) {
          continue;//exclude character codes that are 5 or more characters long
          //and start with 54 or 55
        }
        newCellContent = newCellContent + thisCellChar;

      }

      innerArray.push(newCellContent);
    }

    outerArray.push(innerArray);
  }

  targetSheet.getRange(1, columnToRemoveEmojiesFrom,outerArray.length,1).setValues(outerArray);
}
Alan Wells
  • 30,746
  • 15
  • 104
  • 152
  • I must apologize. Thank you, this works, but it is also deleting Japanese and Chinese characters as well. I should have mentioned that those are included in the text. I will edit the question accordingly. – kentobento Mar 10 '17 at 01:19
  • I modified the code to further detect more specific ranges of character codes. I tested it and it left in Chinese and Japanese characters. – Alan Wells Mar 10 '17 at 01:43
  • Thank you Sandy Good! I actually ended up using your code, to loop through the column and combined it with Cooper's code to take out the emojis. – kentobento Mar 10 '17 at 03:09
  • Will definitely do! – kentobento Mar 10 '17 at 04:01
0

Assuming all your text strings are single words followed by a space and then an Emoji, you can use the formula

=LEFT(A1,(FIND(" ",A1,1)-1))

This will return the textual contents of a cell only (A1 in this example). If all your data is in a single column, you can just pull down and this will apply to all your data.

Samer
  • 63
  • 1
  • 2
  • 9