0

Here's the deal :

I was asked to developp a JAVA program that would do some reorganisations of .tsv files (moving cells to do some kind of transposition).

So, I tried to do it cleanly and got now 3 different packages:

3 different packages.

Only tsvExceptions and tsvTranspositer are needed to make the main (TSVTransposer.java) work.

Yesterday I learned that I would have to implement it in Talend myself which I had never heard of.

So by searching, i stepped on this stackOverflow topic. So i followed the steps, creating a routine, copy/pasting my main inside it (changing the package to "routines") and added the external needed libraries to it (my two packages exported as jar files and openCSV). Now, when I open the routine, no error is showned but I can't drag & drop it to my created job !

Nothing happens.

Nothing happens. It just opens the component infos as shown with "Properties not available."

package routines;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;

import tsvExceptions.ArgsExceptions;
import tsvExceptions.EmptyArgsException;
import tsvExceptions.OutOfBordersArgsException;
import tsvTranspositer.CommonLine;
import tsvTranspositer.HeadOfValuesHandler;
import tsvTranspositer.InputFile;
import tsvTranspositer.OutputFile;


public class tsvRoutine {


    public static void main(String[] args) throws ArgsExceptions {

        // Boolean set to true while everything is good
        Boolean everythingOk = true;

        String inputFile = null; // Name of the entry file to be transposed.
        String outputFile = null; // Name of the output file.
        int serieNb = 1 ; // Number of columns before the actual values in the input file. Can be columns describing the product as well as empty columns before the values.
        int linesToCopy = 0; // Number of lines composing the header of the file (those lines will be copy/pasted in the output)

        /*
         * Handling the arguments first. 
         */
        try {
            switch (args.length) {
            case 0:
                throw new EmptyArgsException();
            case 1:
                inputFile = args[0];
                String[] parts = inputFile.split("\\.");
                // If no outPutFile name is given, will add "Transposed" to the inputFile Name
                outputFile = parts[0] + "Transposed." + parts[1]; 
                break;
            case 2:
                inputFile = args[0];
                outputFile = args[1];
                break;
            case 3:
                inputFile = args[0];
                outputFile = args[1];
                serieNb = Integer.parseInt(args[2]);
                break;
            case 4:
                inputFile = args[0];
                outputFile = args[1];
                serieNb = Integer.parseInt(args[2]);
                linesToCopy = Integer.parseInt(args[3]);
                break;
            default:
                inputFile = args[0];
                outputFile = args[1];
                serieNb = Integer.parseInt(args[2]);
                linesToCopy = Integer.parseInt(args[3]);
                throw new OutOfBordersArgsException();

            }
        }
        catch (ArgsExceptions a) {
            a.notOk(everythingOk);
        }
        catch (NumberFormatException n) {
            System.out.println("Arguments 3 & 4 should be numbers."
                    + " Number 3 is the Number of columns before the actual values in the input file. \n"
                    + "(Can be columns describing the product as well as empty columns before the values. (1 by default)) \n"
                    + "Number 4 is the number of lines to copy/pasta. (0 by default) \n"
                    + "Please try again.");
            everythingOk = false;
        }
        // Creating an InputFile and an OutputFile
        InputFile ex1 = new InputFile(inputFile, linesToCopy); 
        OutputFile ex2 = new OutputFile(outputFile);

        if (everythingOk) {
            try (   FileReader fr = new FileReader(inputFile);
                    CSVReader reader = new CSVReader(fr, '\t', '\'', 0);
                    FileWriter fw = new FileWriter(outputFile);
                    CSVWriter writer = new CSVWriter(fw, '\t', CSVWriter.NO_QUOTE_CHARACTER)) 
            {

                ex1.setReader(reader);
                ex2.setWriter(writer);
                // Reading the header of the file
                ex1.readHead();
                // Writing the header of the file (copy/pasta)
                ex2.write(ex1.getHeadFile());

                // Handling the line containing the columns names
                HeadOfValuesHandler handler = new HeadOfValuesHandler(ex1.readLine(), serieNb);
                ex2.writeLine(handler.createOutputHOV());

                // Each lien will be read and written (in multiple lines) one after the other.
                String[] row;
                CommonLine cl1; 
                // If the period is monthly
                if (handler.isMonthly()) { 

                    while (!ex1.isAllDone()) { 

                        row = ex1.readLine();
                        if (!ex1.isAllDone()) {
                            cl1 = new CommonLine(row, handler.getYears(), handler.getMonths(), serieNb);

                            ex2.write(cl1.exportOutputLines());
                        }   
                    }
                }
                // If the period is yearly
                else {

                    while (!ex1.isAllDone()) { 

                        row = ex1.readLine();
                        if (!ex1.isAllDone()) {
                            cl1 = new CommonLine(row, handler.getYears(), serieNb);

                            ex2.write(cl1.exportOutputLines());     
                        }       
                    }
                }       
            }
            catch (FileNotFoundException f) {
                System.out.println(inputFile + " can't be found. Cancelling...");
            }
            catch (IOException e) {
                System.out.println("Unknown exception raised.");
                e.printStackTrace();
            }

        }

    }
}

I know the exceptions aren't correctly handled yet, but they are in some kind of hurry for it to work in some way.

Another problem that will occur later is that I have no idea how to parse arguments to the program that are required.

Anyway, thanks for reading this post!

Community
  • 1
  • 1
Fitz
  • 327
  • 1
  • 6
  • 19

1 Answers1

1

You cannot add routines per drag and drop to a job. You will need to access the routines functions through components.

For example, you would start with a tFileListInput to get all files you need. Then you could add a tFileInputDelimited where you describe all fields of your input. After this, with e.g. a tJavaRow component, you can write some code which would access your routine.

NOTE: Keep in mind that Talend works usually row-wise. This means that your routines should handle stuff in a row-wise manner. This could also mean that your code has to be refactored accordingly. A main function won't work, this has at least to become a class which can be instanciated or has static functions.

If you want to handle everything on your own, instead of a tJavaRow component you might use a tJava component which adds more flexibility.

Still, it won't be as easy as simply adding the routine and everything will work.

In fact, the whole code can become a job on its own. Talend generates the whole Java code for you:

  • The parameters can become Context variables.
  • The check if numbers are numbers could be done several ways, for example with a tPreJob and a tJava
  • Input file could be connected with a tFileInputDelimited with a dot separator
  • Then, every row will be processed with either a tJavaRow with your custom code or with a tMap if its not too complex.
  • Afterwards, you can write the file with a tFileOutputDelimited component
  • Everything will get connected via right click / main to iterate over the rows

All exception handling is done by Talend. If you want to react to exceptions, you can use a component like tLogRow.

Hope this helps a bit to set the direction.

tobi6
  • 8,033
  • 6
  • 26
  • 41
  • This sure helps a bit. And scares quite a bit too. Thinking i would have to rework my whole code from last 3 days... I'm not sure the transformation can be done row-wise. To simplify, here is what tsv file i get in input : [link](https://postimg.org/image/c2g8p21o1/) And what I am supposed to get in output [link](https://postimg.org/image/vn5g4uzqv/) This is a very simple example, usual tsv files are 15K lines long and end up being 1500K lines long. – Fitz Jun 29 '16 at 15:27
  • Happy to help. If my (or another answer) helped you out, consider marking it as answered ([How to mark question as answered](http://meta.stackexchange.com/questions/147531/how-mark-my-question-as-answered-on-stackoverflow)). – tobi6 Jun 29 '16 at 15:29
  • Sorry for the repost, i had pressed enter without finishing writing. Ty for your help anyway! I'll reconsider what you said tomorrow! – Fitz Jun 29 '16 at 15:33
  • Ah okay, then you should also look into [Converting columns to rows with Talend](https://help.talend.com/display/KB/Converting+columns+to+rows). – tobi6 Jun 29 '16 at 15:44
  • The point is, the job I need to do is really depending on the input. I have to read some rows and depending on what has been read, do different things. As an example, if the period is monthly, one more row has to be created and the Strings "20XXM0X" (which position can change) have to be separeted and displayed in two different columns. I don't believe in talend being able to provide this. But I'm quite the newbie. – Fitz Jun 30 '16 at 07:29
  • That can be done with either a **tMap** component or with a custom **tJavaRow**, maybe also in combination with a routine. This heavily depends on the correct definition of your workflow, which has to happen beforehand. You might consider opening another question how to design this in Talend *when providing a thorough planned workflow*. – tobi6 Jun 30 '16 at 07:42
  • Yes, you are right, i'm pretty off-topic. Thanks for the help overall, i will search into what you mentionned. – Fitz Jun 30 '16 at 07:44
  • Sorry, last question. It has to be efficient too. My first version of the program lasted 18min (15k lines input -> 1500k lines output) and now it lasts about 1,6s. Is the code generated by Talend efficient? – Fitz Jun 30 '16 at 08:07
  • Yes. It always depends on the amount of data and the infrastructure. There is also a *Big Data* edition of Talend. – tobi6 Jun 30 '16 at 09:48