I have a CSV file that looks like this:
,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault type C_Desc,Comment R,Baumuster Sales desc.,Baumuster Technical desc.,T24
0,09122,Engine,42,Poor fit,7117215,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0092,55SWF8DB7KU316971,,A,2019,,,,,,C 300,205 E20 G,
1,09122,Engine,42,Poor fit,7117235,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0122,55SWF8DB2KU316991,,A,2019,,,,,,C 300,205 E20 G,
2,09122,Transmission,42,Poor fit,7117237,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0126,55SWF8DB6KU316993,,A,2019,,,,,,C 300,205 E20 G,
I want to write code that gets the word count of words of a selected column header after tokenizing the words of the selected column (in dictionary style key-value pairs). I also want to keep the word count sorted by value in descending order. eg.
Location_Desc
Engine: 2
Transmission: 1
This is the code I have so far:
int colNumber;
for(colNumber=0; colNumber<columns.Length; colNumber++)
{
if ( columns[colNumber].Equals(columnHeader))
{
break;
}
}
Debug.WriteLine("Column Number: " + colNumber);
for(int i=0; i<inputCsv.Length; i++)
{
string[] row = inputCsv[i].Split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
string column = row[colNumber];
Debug.WriteLine(row.ToString());
}
I was able to get the column header name via a for loop, but not only am I unable to ignore commas inside of quotations, I was unable to get the values from the column header (also known as a Series in Python's Pandas).
Help is much appreciated!