0

I'm trying get from a cell just the value of the 'id' tag separated by ';'.

The data is as follows:

Cell:

A1: {"id":1145585,"label":"1145585: Project Z"}

A2: {"id":1150322,"label":"1150322: Project Waka 1"}|{"id":1150365,"label":"1150365: Project Waka 2"}

A3: {"id":1149240,"label":"1149240: Analysis of Technical Options"}|{"id":1149258,"label":"1149258: Check and Report"}

A4: {"id":1148925,"label":"1148925: Change Management Review"}|{"id":1148920,"label":"1148920: Follow-Up Meetings"}|{"id":1148923,"label":"1148923: Launch Date Definition"}

I have tried to use left, mid and find functions, however the number of 'IDs' can vary from 1 to 1000. I'm also trying to avoid using vba, but it seems to be the only option. So any solution is great!

The result should be:

Cell:

A1: 1145585

A2: 1150322;1150365

A3: 1149240;1149258

A4: 1148925;1148920;1148923

Any ideas?

Thanks!

VBasic2008
  • 44,888
  • 5
  • 17
  • 28
dantas
  • 11
  • 2
  • Possible duplicate: https://stackoverflow.com/questions/36835136/remove-text-appearing-between-two-characters-multiple-instances-excel – Ricardo Diaz Sep 12 '19 at 23:30
  • Google how to use [Regular Expressions](https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops) in VBA there are many tutorials. Try something on your own before you ask us to do the work for you. Use a pattern like this for example [`"id":([0-9]*)`](https://regex101.com/r/PDTwsn/2) to get all the IDs – Pᴇʜ Sep 13 '19 at 06:18
  • Alternatively do a RegEx substitution with this: https://regex101.com/r/is7xLx/1 You can paste your data in the "Test String" and get your IDs in the "Substitution". – Pᴇʜ Sep 13 '19 at 06:37

2 Answers2

1

Sounds like a task for #powerquery. Please refer to this article to find out how to use Power Query on your version of Excel. It is availeble in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.

The steps are:

  1. Load the source data to power query editor which should look like the following:

Step1

  1. Use Index Column function under the Add Column tab to add an Index column;

Step2

  1. Use Split Column function under the Transform tab to split the column by custom delimiter "id": and put the results into Rows as shown below:

Step3

  1. Use Extract function under the Transform tab to extract the first 7 characters of the column;

Step4

  1. Change the Data Type to Whole Number, remove Errors, and then change the Data Type back to Text;

Step5

  1. Use Group By function under the Transform tab to group Column1 by Index as set out below. Don't panic if the result is in error as it is expected.

Step6

  1. Go back to last step and replace the original formula in the formula bar with the following one as Text.Combine is not a built-in function:

= Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})

Step7

  1. Close & Load the output to a new worksheet (by default), and you should have the following:

Step8

Here are the Power Query M codes behind the scene. Most of the steps are performed using built-in functions except the last step of manually replacing the formula with the correct one. Let me know if you have any questions. Cheers :)

let
    Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Column1", Splitter.SplitTextByDelimiter("""id"":", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1", type text}}),
    #"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", {{"Column1", each Text.Start(_, 7), type text}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Extracted First Characters",{{"Column1", Int64.Type}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type2", {"Column1"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Errors",{{"Column1", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
in
    #"Grouped Rows"
Terry W
  • 3,199
  • 2
  • 8
  • 24
  • Nicely done. Although you can remove the extraneous rows with a filter. Nice edit of the SUM function. I've converted the Table to a List with `Table.Column` and then combined it. Your method is simpler for numeric data. – Ron Rosenfeld Sep 13 '19 at 11:14
  • @RonRosenfeld I actually tried to use FILTERXML but only able to find the first number string. I guess this function is not good for finding multiple strings or I need to try harder :) – Terry W Sep 13 '19 at 11:22
  • Multiple number strings can be returned, but an issue with `FILTERXML` is that it will convert numeric data to "real" numbers, thereby dropping any leading zero's or other formatting. I tend to avoid it when the numeric format is not necessarily fixed. – Ron Rosenfeld Sep 13 '19 at 11:54
1

Based on @TerryW comment, here is a solution using the FILTERXML function available in Excel 2013+. But it also requires TEXTJOIN which did not appear until later versions of Excel 2016 (and office 365)

It relies on the fact that the id string is always followed by a comma.

A disadvantage is that FILTERXML will return the numeric id's as numeric values. So leading zero's will be dropped. If there are always a fixed number of digits in the id and leading zero's need to be present, this can be mitigated by using the TEXT function.

We construct an xml by dividing both on id and on comma

We then use an xpath to return the node which follows the node that contains id

=TEXTJOIN(";",TRUE,FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A1,"""id"":",",id,"),",","</s><s>")&"</s></t>","//s[text()='id']/following-sibling::*[1]"))

Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar

Source

enter image description here

Results

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Great work! Now I see how to return multiple strings using **FILTERXML**, the key is to use **SUBSTITUTE** to put comma `,` or a symbol in front and after the target string and replace it with `` and then use `following-sibling` to return the strings. – Terry W Sep 16 '19 at 00:01