1

I am trying to separate information copied from a PDF table - id usually use text to columns but the only delamination is spaces and this then splits the data into multiple unusable columns

The data comes like this:

Raw Data
A1 Company 0
Company2 40000
name a 1
name b 15
name c 184
Big 17 Company 1887

I need the output to be:

Company Units
A1 Company 0
Company2 40000
name a 1
name b 15
name c 184
Big 17 Company 1887

So the company name (that might contain numbers) is separated for the unit number (that could be 1-5 digits long).

I haven't been able to figure out a way that uses =len() as the string length isn't a constant mixed with the last numbers not being a consistent number of digits.

I'm currently using:

=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1)) * ROW(INDIRECT("1:"&LEN(A2))), 0), ROW(INDIRECT("1:"&LEN(A2))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A2)))/10) 

This gives me all the numbers in the cell - which works for 90% of the data as most of the company's don't have numbers in their name. But for something like 'A1 Company 0' it gives 10 as the output not just the 0. I then go and manually edit the small number of companies that this happens too.

I then use a mixture of =LEN() =LEFT and =RIGHT to split the information up as required for the further automated analysis.

I'd prefer a formula over VBA/macro

I cant provide the actual data but I hope I've given enough examples in the table above to show the main problems (different company name lengths, companies with numbers in their name, different amount of digits representing the units)

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • If there are always spaces before the numbers, you could just use: `=RIGHT(SUBSTITUTE(A2," ",REPT(" ",100)),100)+0`. You could also use Power Query which is designed for stuff like this. – Rory Jun 08 '22 at 12:02

3 Answers3

1

FILTERXML() would best choice for this case. Try-

=FILTERXML("<t><s>"&SUBSTITUTE(A1:A6," ","</s><s>")&"</s></t>","//s[last()]")

Details about FILTERXML() from JvdV here.

enter image description here

Harun24hr
  • 30,391
  • 4
  • 21
  • 36
1

Using Libre Office, but this formula checks for the last space in the cell

=RIGHT(A1,LEN(A1)-FIND("@",SUBSTITUTE(A1," ","@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))

enter image description here

Taken from: https://trumpexcel.com/find-characters-last-position/

JonTout
  • 618
  • 6
  • 14
0

See if the following works for you:

enter image description here

Formula in B2:

=LEFT(A2,LEN(A2)-1-LEN(C2))

In C2:

=-LOOKUP(1,-RIGHT(A2,ROW($1:$5)))

For those users using ms365's newest functions:

=HSTACK(TEXTBEFORE(A2," ",-1),TEXTAFTER(A2," ",-1))
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    This worked (and actually does some of the work I had the spreadsheet automating later on so might cut out some processing time!) – imnotcharlotte Jun 08 '22 at 11:02