0

I have been working on extracting text from a csv file and store the data in a string. But now, I would like to extract text from some of the specific columns and store the data in a string.I would like the wordDocContents variable to contain the specific columns and the data in those specific columns which is bank_account, bank_name, customer_name. Currently, my wordDocContents has the entire data from my csv file. Is there a way to filter out the specific columns and the data in those columns and store it in the variable wordDocContents. Thanks

Here is what I tried so far -

public void button1Clicked(object sender, EventArgs args)
{
    button1.Text = "You clicked me";

    var textExtractor = new TextExtractor();

    var wordDocContents = textExtractor.Extract("t.csv");
    Console.WriteLine(wordDocContents);
    Console.ReadLine();
}

The contents of wordDocContents:-

ACCOUNT_NUMBER,CUSTOMER_NAMES,VALUE_DATE,BOOKING_DATE,TRANSACTION,ACCOUNT_TYPE,BALANCE_TYPE,REFERENCE,MONEY.OUT,MONEY.IN,RUNNING.BALANCE,BRANCH,EMAIL,ACTUAL.BALANCE,AVAILABLE.BALANCE
1000000001,TEST,,2847899,KES,Account,,,10/10/2016,9/11/2016,15181800,UPPER HILL BRANCH,another@yahoo.com,5403.75,5403.75,
1000000001,,9/11/2016,9/11/2016,Opening Balance,,,,,,4643.22,,,,,
1000000001,,12/10/2016,12/10/2016,Mobile Mpesa Transfer,,,,1533,,3110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29006E+11,1000,,2110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29118E+11,2000,,110.22,,,,,
1000000001,,17-10-2016,17-10-2016,Mobile Mpesa Transfer,,,,2083,,-1972.78,,,,,
1000000001,,17-10-2016,17-10-2016,Transfer from Mpesa,,,,0,4000,2027.22,,,,,
1000000001,,18-10-2016,18-10-2016,Mobile Mpesa Transfer,,,,333,,1694.22,,,,,
  • you can loop through it and select only perticular columns – Ameya Deshpande Feb 02 '17 at 05:08
  • @AmeyaDeshpande:- Can you kindly provide a small rough code or something that can explain the process? –  Feb 02 '17 at 05:10
  • can you put what you get in `wordDocContents` – Ameya Deshpande Feb 02 '17 at 05:13
  • @AmeyaDeshpande:- I have added the contents in wordDocContents. –  Feb 02 '17 at 05:15
  • `http://stackoverflow.com/questions/21991609/how-do-i-delete-certain-column-from-csv-file` this will give you some idea.. – Ameya Deshpande Feb 02 '17 at 05:23
  • Does the TextExtractor class offer you any options, or does it only have the Extract() method? – Nick Feb 02 '17 at 05:40
  • @Nick:- The TextExtractor class doesnt have the option to select specific columns. –  Feb 02 '17 at 05:43
  • @Jessica - where is the TextExtractor from and what other functions does it have? – Nick Feb 02 '17 at 05:46
  • @Nick:- Its TikaonDotNet textextractor –  Feb 02 '17 at 05:54
  • @Jessica - any reason you're using that tool? There's a CSV parser in .NET Framework now, the TextFieldParser class. Did you check out some of the alternatives on http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-with-header – Nick Feb 02 '17 at 06:54

1 Answers1

1

From my knowledge on how csv files are constructed. (Maybe post the first 2 lines of your output?)

string[] lines = wordDocContents.Split("\n");
string[] columns = lines[0].Split(",");
string[][] data = new string[lines.Length][columns.Length];

Now let's say customer_name is under columns[2], you can try to:

List<string> customerNames = new List<string>();
for (int i = 1; i < lines.Length; i++) {
customerNames.Add(data[i][2]);
}

Edit just saw the output, this code might need some adjusting for your particular case. I am not 100% sure if string.Split(",") works for multiple commas in a row, but it's worth a shot. Just change the [2] to whichever column you need.

It should be going from [0],[1],[2] etc.

Jasoon
  • 15
  • 5