0

I am looking to extract the numbers of hours in a column TXT however PQ is unable to find a pattern due to the inconsistency in the structure of text. I have used Columns by Example but hasnt helped.

Is there a M code or combination of M code I can use?

Sample data:

TXT
Contract Staff w/e 26.06.21- Carer 9.5hrs- MNL
Contract staff w/e02.07.21 - Physio- 19.34hrs- ARK
Contract Staff w/e 04.07.21 - RN 13.25- MNL
Contract Staff w/ e 04.07.21 - carer 6- MNL
Contract Staff w/e25.06.21 - carer 12.5 - KLTL
Contract Staff w/e04.07.21 - RN 34hrs- KLTL
Contract Staff w/e04.07.21 - AIN 25.5hrs- KLTL
Contract Staff w/e26.06.21- Carer - 6hrs- MNL
Contract Staff w/e11.07.21 - 6hrs- MNL
Contract Staff wie 24.06.21 - Carer 8hrs - ARK
Contract Staff w/e 16.06.21 - EN 5.50- GL
Contract Staff w/e 16.06.21 - RN 5.25- GL
Contract Staff w/e 11.07.21 - RN 22hrs- MNL
Contract Staff w/e 11.07.21 - carer 27.75- MNL
Contract Staff w/e04.07.21 - RN 22.25hrs- KLTL
Contract Staff w/e04.07.21 - AIN 69.67 - KLTL
Contract Staff w/e04.07.21 - RN 5.75- KLTL
Contract Staff w/e10.07.21 - RN 16hrs- KLTL
Contract Staff w/e10.07.21- Carer 6hrs- KLTL
Contract Staff w/e11.07.21 - AIN 38.50- KLTL
Contract Staff w/e18.07.21 - RN 46-KLTL
Contract Staff w/e18.07.21 - AIN 17 -KLTL
Contract Staff w/e18.07.21 - Cleaner 24.50 -KLTL
Contract Staff w/e18.07.21 - AIN 19.5- MNL
contract staff w/16.07.21 - RN23.25hrs - MNL
contract staff WIE 25.07.21 - carer - 42.25hrs- MNL
contract staff w/E 18.07.21 - AIN 24.5 - KLTL
contract staff WIE 18.07.21 - Domestic 6- KLTL
Contract Staff w/e 13.07.21 - RN 6.25hrs- KLTL
Contract Staff w/e25.07.21 - RN 19.5- KLTL
Contract Staff w/e25.07.21 - AIN 13.5- KLTL
Contract Staff w/e18.07.21 - AIN 6hrs- ARK
Peter
  • 10,959
  • 2
  • 30
  • 47
SAN
  • 15
  • 2
  • You may want to define the patterns you need and then combine multiple text functions to extract accordingly. For example, you could start with ```Text.BetweenDelimiters``` using “-“ as delimiters. With Regex it easier to define and extract according to pattern, but it is not supported by PowerQuery out of the box, so you may need another tool for that. – Anonymous Jun 26 '22 at 01:28
  • Thanks for the repsonse..I know google sheets has REGEX functions.. Is there a place where I can get the reference on what REGEX to use to define the Patterns? – SAN Jun 26 '22 at 01:45
  • There are a few sites like this one where you can actually run your regex expression against a test string: https://regex101.com/ and you may start from here: https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops … there is e.g. an add-on for regex in Excel (but PowerQuery is not supported) – Anonymous Jun 26 '22 at 02:06
  • 1
    If you post sample input and expected output, someone will be able to provide a solution. There is a workaround to use REGEX in PQ but it is not recommended and quite slow. – Davide Bacci Jun 26 '22 at 07:41
  • Yes, OP can check Peter’s edited answer below. It looks quite right to extract the numbers of hours. I just mentioned regex as a more flexible alternative to modify/adjust, e.g. in case OP wants also to extract the substrings before the hours and after the hours or there are more “variants” of the string to extract from. While not supported in PQ out of the box, regex works directly as cell function with the add-on installed. – Anonymous Jun 26 '22 at 23:44

1 Answers1

1

You don't need regex for this:

  1. Add column, extracting text after "-"
  2. Remove all letters and remaining "-"
  3. Convert text to number
#"Inserted Text After Delimiter" = Table.AddColumn(
    Source, 
    "hours", 
    each Text.AfterDelimiter([TXT], "-"), type text
),
#"Remove Letters" = Table.TransformColumns(
    #"Inserted Text After Delimiter", 
    {{"hours", each Text.Remove(_, {"A".."z", "-"})}}
),
#"Changed Type to number" = Table.TransformColumnTypes(
    #"Remove Letters",
    {{"hours", type number}}
)
Peter
  • 10,959
  • 2
  • 30
  • 47