0

Google Sheets has a cool function called importxml. It lets you scrape data from public web sources e.g. a URL's title.

You can write an XPath expression e.g. //title/text() and it will find all occurrences of some value and display those in your Google Sheet as separate rows. Nifty!

If the contents of <element></element> contain HTML markup e.g. <element>some<b>data</b>is<em>here</em>. then Google Sheets gives you two options:

  1. if you append /text() at the end of your XPath then the result is the direct text value of the element e.g. some here
  2. if you do not append /text() at the end of your XPath then the result is the entire text split across multiple cells in the row. You end up with |some|data|is|here|. (where | acts as my cell divider)

I am trying to get some data is here in a single cell, not across multiple cells on my row. Is there a way to do that?

Using textjoin or concatenate on importxml doesn't work because it then applies it to the entire result and if there are multiple instances of <element>some<b>data</b>is<em>here</em>. then I get them all in a single cell instead of one result per row (which is what I want).

player0
  • 124,011
  • 12
  • 67
  • 124
David Brossard
  • 13,584
  • 6
  • 55
  • 88

1 Answers1

2

try:

=TEXTJOIN(" "; 1; your_importxml_here)

for array use:

=FLATTEN(QUERY(TRANSPOSE(your_importxml_here);;9^9))
player0
  • 124,011
  • 12
  • 67
  • 124
  • Can you explain each part of flatten? – David Brossard Jan 08 '22 at 00:27
  • @DavidBrossard FLATTEN has no parts. it just flattens the array from whatever state into one single column. in this ocassion it can be replaced for TRANSPOSE. if you mean the QUERY(,,9^9) part then 9^9 is just big number you can replace with any number you wish it just needs to be higher than all rows of an array. so the formula will transpose the array then it will colapse/join all rows into one single row adding a space between each "cell" and then we just flatten the row into column so you will end up with joined array – player0 Jan 08 '22 at 00:36
  • @DavidBrossard in case you got multiple spaces (from empty cells) in joined data you can just use: `=INDEX(TRIM(FLATTEN(QUERY(TRANSPOSE(your_importxml_here);;9^9))))` – player0 Jan 08 '22 at 00:38
  • ok flatten(...) is spot on but I still don't get it. Why transpose? Why query? And how does it preserve each entry? Either way, thanks a million! – David Brossard Jan 08 '22 at 00:40
  • 1
    @DavidBrossard perhaps some visuals may help... search for: `vertical query smash` on https://stackoverflow.com/questions/65435313 and read downwards from there – player0 Jan 08 '22 at 01:29