2

I need to extract text from a block of text. The block looks like this:

{"some text"},{"some more text"},{"some other"},{"some other text"},{"let's add more"},{"even more other text"}

The text is supposed to be in pairs so that the first and second {""},{""} go into the same cell and so forth. I tried the split function but since the delimiter is a comma I am not able to split in pairs. So I need to split after every 2 commas and put the resulting pair in it's own cell. Then Transpose the result so that the text goes inside a column vertically.

I tried =SPLIT("",",") but didn't get the desired output.

I am using Google sheets for this and would like to do this inside Google Sheets, but I'm open to doing this inside Excel.

If you want to have a look at the Google Sheets File, here is the link https://docs.google.com/spreadsheets/d/1sbK6nF-6SO4oOKTdercgnfHGeQol-vfV5SRcFXNBy5o/edit?usp=sharing

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Questions should be self contained. While external links are welcome, consider adding a ascii table like ([this](https://stackoverflow.com/q/63837444) or [this](https://stackoverflow.com/q/64182673/)) or screenshots or csv text(like [this](https://stackoverflow.com/a/64186520/)) to show your data structure. – TheMaster Oct 21 '20 at 09:43

2 Answers2

5

Interesting and fun query.

In B2, see if the following formula works for you:

=TRANSPOSE(SPLIT(REGEXREPLACE(A2,"([^,]+,[^,]+),","$1|"),"|"))

The regular expression can be made more specific to target only those comma's in between curly brackets if need be.


Edit

To SPLIT your desired outcome into two columns and get rid of the curly brackets and outer quotes I came up with:

=TRANSPOSE({SPLIT(REGEXREPLACE(A2,"{""([^,]+)""},{""([^,]+)""}(?:,|$)","$1|"),"|");SPLIT(REGEXREPLACE(A2,"{""([^,]+)""},{""([^,]+)""}(?:,|$)","$2|"),"|")})
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Added this to D2 ```=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IF(C2:C="","",(split(C2:C,","))),"{",""),"}",""),"""",""))``` Thanks to your formula, I have separated the text but I'm not able to make it into one single formula. I wanted to manipulate your formula a little further but I can't. If you have a moment, please see my attempt in D2. – mary ongubo Oct 21 '20 at 09:03
  • 1
    @maryongubo, I answered that to the best of my GS knowledge. – JvdV Oct 21 '20 at 09:41
  • 1
    That is the formula I was looking for and you nailed it! Thanks. – mary ongubo Oct 21 '20 at 10:23
  • This solution does not tolerate commas inside the strings. – kishkin Oct 21 '20 at 11:34
0

To adapt the solution for ARRAYFORMULA we just need to join all the strings with comma.

A few notes on the following solution:

With REGEXREPLACE we strip all the surrounding {"..."}. At the same time we replace every even comma (not just every comma, but the ones between records, e.g. commas inside the strings won't be affected) with "♠" for later splitting pairs into the separate strings, and every odd comma we replace with "♥" for later splitting a pair into two strings.

=ARRAYFORMULA(
  SPLIT(
    TRANSPOSE(
      SPLIT(
        REGEXREPLACE(
          TEXTJOIN(
            ",",
            True,
            A2:A
          ),
          "\{""(.*?)""\},\{""(.*?)""\},?",
          "$1♥$2♠"
        ),
        "♠"
      )
    ),
    "♥"
  )
)

enter image description here

kishkin
  • 5,152
  • 1
  • 26
  • 40
  • The strings that I'm dealing with won't be having any commas, but I must agree that the above is a safe way, just in case. I have stumbled into another challenge. Some strings don't have a match so they won't be in pairs. I wish to add an empty pair for the string which doesn't have it's match. Ideally, the first of the pair will have a number between 1-20 while the second is basically text. Example - ```{"This doesn't change":"2"},{"This doesn't change":"This changes"},``` But there are cases where the second of the pair is ```{"This doesn't change":"This changes"}``` only. See document link – mary ongubo Oct 21 '20 at 13:59
  • @maryongubo could you show how that looks in a shhet? – kishkin Oct 21 '20 at 14:18
  • The link https://docs.google.com/spreadsheets/d/1sbK6nF-6SO4oOKTdercgnfHGeQol-vfV5SRcFXNBy5o/edit#gid=891711070 – mary ongubo Oct 21 '20 at 14:21
  • While this might be a unique case, I hope this answer by @kishkin could be of some help to someone. ```=ARRAYFORMULA( SPLIT( TRANSPOSE( SPLIT( REGEXREPLACE( TEXTJOIN( ",", True, A2:A ), "(?:\{""yellow and red"":""(\d*)""\},)?\{""yellow and red"":""(.*?)""\},?", "$1♥$2♠" ), "♠" ) ), "♥", True, False ) )``` – mary ongubo Oct 21 '20 at 14:53
  • I'm trying to revert the order from ```Number, Name``` to ```Name, Number```. I changed the formula but it appears I'm missing something. ` =ARRAYFORMULA( SPLIT( TRANSPOSE( SPLIT( REGEXREPLACE( TEXTJOIN( ",", True, A1:A ), "\{""yellow and home team"":""(.*?)""\},?","$1♥$2♠?(?:\{""yellow and home team"":""(\d*)""\},)" ), "♠" ) ), "♥", True, False ) )` – mary ongubo Oct 22 '20 at 08:27
  • 1
    had to use `[^"]`: `=ARRAYFORMULA( SPLIT( TRANSPOSE( SPLIT( REGEXREPLACE( TEXTJOIN( ",", True, A1:A ), "\{""[^""]*"":""([^""]*)""\},?(?:\{""[^""]*"":""(\d*)""\},?)?", "$1♥$2♠" ), "♠" ) ), "♥", True, False ) )` – kishkin Oct 22 '20 at 12:43