0

I am trying to do some transformation and I’m stuck. Here goes the problem description.

Below is the pipe delimited file. I have masked data!

AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46

In this file, we have got CalculatedCurrency field where we have multiple values delimited by a comma. The file also has field CalculatedCurrencyAmount which too has multiple values delimited by a comma. But I need to pick up only that currency value from CalculatedCurrency field which belongs to BranchCurrency (another field in the file) and of course corresponding CalculatedCurrencyAmount for that Currency.

Required output : -

AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp|ActualCurrency|ActualAmount 61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46|EUR|30.00

Please help.

Snaplogic Python Script

from com.snaplogic.scripting.language import ScriptHook
from com.snaplogic.scripting.language.ScriptHook import *
import csv

class TransformScript(ScriptHook):
    def __init__(self, input, output, error, log):
        self.input = input
        self.output = output
        self.error = error
        self.log = log

    def execute(self):
        self.log.info("Executing Transform script")

        while self.input.hasNext():
            data = self.input.next()
            branch_currency = data['BranchCurrency']
            calc_currency = data['CalculatedCurrency'].split(',')
            calc_currency_amount = data['CalculatedCurrencyAmount'].split(',')

            result = None
        for i, name in enumerate(calc_currency):
            result = calc_currency_amount[i] if name == branch_currency else result
            data["CalculatedCurrencyAmount"] = result
            result1 = calc_currency[i] if name == branch_currency else result
            data["CalculatedCurrency"] = result1



            try:
                data["mathTryCatch"] = data["counter2"].longValue() + 33
                self.output.write(data)
            except Exception as e:
                data["errorMessage"] = e.message
                self.error.write(data)






        self.log.info("Finished executing the Transform script") 
hook = TransformScript(input, output, error, log)
Bilesh Ganguly
  • 3,792
  • 3
  • 36
  • 58
  • Would you mind doing this in some higher language? – ferdy Aug 21 '18 at 06:14
  • 1
    @SubhasreeMitra: And you want to do this with Posix constructs only???? Why not - at least - use awk or, as ferdy suggested, a "real" language. Even some shell with pattern matching capabilities (Zsh, bash) would be more helpful than a pure Posix shell. – user1934428 Aug 21 '18 at 06:30
  • Python, Javascript or Ruby is also fine – Subhasree Mitra Aug 21 '18 at 07:20
  • Why are you using a script snap for this? It hampers the performance and defeats the purpose of an IPaaS tool altogether. – Bilesh Ganguly Oct 21 '18 at 18:52

4 Answers4

0

Using awk:

awk 'BEGIN{FS=OFS="|"}
     NR==1{print $0,"ActualCurrency","ActualAmount";next}
     {n=split($9,a,",");split($10,b,",");for(i=1;i<=n;i++) if(a[i]==$5) print $0,$5,b[i]}' file

BEGIN{FS=OFS="|"} sets the input and ouput delimiter to |.

The NR==1 statement takes care of the header by adding the 2 strings.

The 9th and 10th fields are splitted based on the , separator, and values are set inside the arrays a and b.

The for loop is trying to find the value of the a array corresponding to the 5th field. If found, the corresponding value of the b is printed.

oliv
  • 12,690
  • 25
  • 45
  • This is very helpful oliv, thank you. But what if I have more than one records in the file? Should I use a for loop then? – Subhasree Mitra Aug 21 '18 at 07:16
  • @SubhasreeMitra No need for loop. `awk` will apply this script for each line of the input file. – oliv Aug 21 '18 at 07:40
  • Can you help there? Imagine, there are more than one records. – Subhasree Mitra Aug 21 '18 at 07:41
  • @SubhasreeMitra I can't help more... Just try the exact same script with your bigger file. – oliv Aug 21 '18 at 07:44
  • Yes I got that.. but this is coming as - sh Rydoo.sh Calculated Currency is CalculatedCurrency USD,INR,EUR USD,INR,EUR Calculated Actual Amount is 30.00 30.00 I dont want to have the values togather, i want in separate rows like in the record; might be appending additional field at the end of the record. Everything else in the record should be as it is. – Subhasree Mitra Aug 21 '18 at 07:45
  • @SubhasreeMitra What you're asking is unclear, please update your question with the _exact_ output format you want. – oliv Aug 21 '18 at 07:49
  • I have edited my original question adding the output format. – Subhasree Mitra Aug 21 '18 at 08:23
  • Thank you oliv.. I get below error - syntax error The source line is 1. The error context is <<< >>> BEGIN{FS=OFS="|"} awk: Quitting The source line is 1. – Subhasree Mitra Aug 21 '18 at 09:48
  • @SubhasreeMitra The script works for GNU awk (gawk). Check the version you have on your system. – oliv Aug 21 '18 at 09:51
0

Using bash with some arrays:

arr_find() {
        echo $(( $(printf "%s\0" "${@:2}" | grep -Fnxz "$1" | cut -d: -f1) - 1 ))
}
IFS='|' read -r -a headers
while IFS='|' read -r "${headers[@]}"; do
        IFS=',' read -r -a CalculatedCurrency <<<"$CalculatedCurrency"
        IFS=',' read -r -a CalculatedCurrencyAmount <<<"$CalculatedCurrencyAmount"

        idx=$(arr_find "$BranchCurrency" "${CalculatedCurrency[@]}")
        echo "BranchCurrency is $BranchCurrency. Hence CalculatedCurrency will be ${CalculatedCurrency[$idx]} and CalculatedCurrencyAmount will have to be ${CalculatedCurrencyAmount[$idx]}."

done

First I read all headers names. Then read all values into headers. Then read CalculatedCurrency* correctly, cause they are separated by ','. Then I find the element number which is equal to BranchCurrency inside CalculatedCurrency. Having the element index and arrays, I can just print the output.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

I know, the op asked for unix shell, but as an alternative option I show some code to do it using python. (Obviously this code can be heavily improved also.) The great advantage is readability, for instance, that you can address your data via name. Or the code, which is much more readable at all, than doing this with awk et al.

Save your data in data.psv, write the following script into a file main.py. I've tested it using python3 and python2. Both works. Run the script using python main.py.

Update: I've extended the script to parse all lines. In the example data, I've set BranchCurrency to EUR in the first line and USD in the secondline, as a dummy test.

File: main.py

import csv

def parse_line(row):
  branch_currency = row['BranchCurrency']
  calc_currency = row['CalculatedCurrency'].split(',')
  calc_currency_amount = row['CalculatedCurrencyAmount'].split(',')

  result = None
  for i, name in enumerate(calc_currency):
    result = calc_currency_amount[i] if name == branch_currency else result

  return result


def main():
  with open('data.psv') as f:
    reader = csv.DictReader(f, delimiter='|')
    for row in reader:
      print(parse_line(row))


if __name__ == '__main__':
  main()

Example Data:

[:~] $ cat data.psv 
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46

Example Run:

[:~] $ python main.py 
30.00
35.20
ferdy
  • 7,366
  • 3
  • 35
  • 46
  • Thanks for the script.. but this is not working.. the error is as below - Reason: Script does not have a variable named 'hook', Resolution: Add a global variable named 'hook' to the script that is instantiated to the class that implements ScriptHook interface – Subhasree Mitra Aug 21 '18 at 07:35
  • Further extended to work on more than one line. Still, this script can be made very more pythonic, though. But it works, also with python2. – ferdy Aug 21 '18 at 07:59
  • Thanks ferdy, its working.. actually i am calling from snaplogic. So wrote your script in snaplogic format. But this logic is working for only the last record in the file, not for all. Please help. I have added the snaplogic python script in the original question. – Subhasree Mitra Aug 21 '18 at 08:31
  • To be honest, I've heard of "snaplogic" the first time in my life. – ferdy Aug 21 '18 at 09:36
  • @SubhasreeMitra - You need to have the global 'hook' variable in your script. – Bilesh Ganguly Nov 26 '19 at 21:17
0

You don't need to use the script snap here at all. Writing scripts for transformations all the time hampers the performance and defeats the purpose of an IPaaS tool altogether. The mapper should suffice.

I created the following test pipeline for this problem.

enter image description here

I saved the data provided in this question in a file and saved it in SnapLogic for the test. In the pipeline, I parsed it using a CSV parser.

enter image description here

Following is the parsed result.

enter image description here

Then I used a mapper for doing the required transformation.

enter image description here

Following is the expression for getting the actual amount.

$CalculatedCurrency.split(',').indexOf($BranchCurrency) >= 0 ? $CalculatedCurrencyAmount.split(',')[$CalculatedCurrency.split(',').indexOf($BranchCurrency)] : null

Following is the result.

enter image description here

Avoid writing scripts for problems that can be solved using mappers.

Bilesh Ganguly
  • 3,792
  • 3
  • 36
  • 58