2

So I have a .CSV file which has possibly several millions, maybe even billions lines of data. The data is in the format below:

1,,5,6,7,82,4,6
1,4,4,5,6,33,4,
2,6,3,,6,32,6,7
,,,2,5,45,,6
,4,5,6,,33,5,6

What I am trying to achieve is this: Lets assume each line of data is an "event". Lets call it that. Now lets say an user says, show me all events where the 6th value is 33. You can see above that the 6th data element is a 2 digit number and the user can say show me all events where the 6th data element is 33 and the output would be:

1,4,4,5,6,33,4,
,4,5,6,,33,5,6

Also, as you can see. The data can have blanks or holes where data is missing. I don't need help reading a .CSV file or anything. I just cant wrap my mind around how I would access the 6th data element. Also, I would prefer if this output is represented in a collection of some sort maybe. I'm new to C# so I don't have much knowledge about the inbuilt classes. Any help will be appreciated!

Hossein Narimani Rad
  • 31,361
  • 18
  • 86
  • 116
sparta93
  • 3,684
  • 5
  • 32
  • 63

4 Answers4

1

I suggest instead of using term "event" to call this data structure more customarily as "rows and columns" and use C# Split() function to create 2d-array (string[,] or int[,]), where each element is conveniently accessible by its row/column index, and to apply whatever business logic to those elements.

Possible implementation of the CSV file reader (by line, with each line stored in the List<string> listRows) is shown below (re: Reading CSV file and storing values into an array)

using System.IO;
static void Main(string[] args)
{
    var reader = new StreamReader(File.OpenRead(@"C:\YouFile.csv"));
    List<string> listRows= new List<string>();
    while (!reader.EndOfStream)
    {
        listRows.Add(reader.ReadLine());
    }
}

Then apply Split(',') function to each row (stored in listRows) to compose a 2d-array string[,] and use int.TryParse() method to convert it to type int (optional, upon necessity).

Alternatively, this could be implemented by using LINQ Library, which is not recommended because of unnecessary extension of the technology surface area, plus possible performance degradation (LINQ solution expected to be slower than suggested direct processing).

Hope this may help.

Community
  • 1
  • 1
Alexander Bell
  • 7,842
  • 3
  • 26
  • 42
1

Using Linq it is pretty easy to achieve. I'm posting as sample from LinqPad and providing output. All you need to do is to replace 33 with a parameter:

void Main()
{
string csvFile = @"C:\Temp\TestData.csv";
    string[] lines = File.ReadAllLines(csvFile);

    var values = lines.Select(s => new { myRow = s.Split(',')});
//and here is your collection representing results  
   List<string[]> results = new List<string[]>();

    foreach (var value in values)
    {
       if(value.Values.Contains("33")){
        results.Add(value.myRow);
       }
    }

    results.Dump();
}

Output: enter image description here

or if you want you can have it all in one shot by doing this

 string csvFile = @"C:\Temp\TestData.csv";
 string[] lines = File.ReadAllLines(csvFile);

  var values = lines.Select(s => 
    new {Position =Array.FindIndex(s.Split(','),a=>a.Contains("33"))+1
         ,myRow = s.Split(',')
        });

so the final product will have both - the position of your search (33) and the complete string[] of items.

Yuri
  • 2,820
  • 4
  • 28
  • 40
0

Create a class EventEntity. In this class create a List<int> with a constructor that initializes the list. Here is a class example:

    public class EventEntity
    {
        public EventEntity()
        {
            EventList = new List<int>();
        }

        public List<int> EventList { get; set; }
    }

From there loop through each row of data. Example:

public class EventEntityRepo
{
    public EventEntity GetEventEntityByCsvDataRow(String[] csvRow)
    {
        EventEntity events = new EventEntity();

        foreach (String csvCell in csvRow)
        {
            int eventId = -1;

            if(csvCell != null && csvCell != String.Empty)
            {
                try
                {
                    eventId = Convert.ToInt32(csvCell.Trim());
                }
                catch (Exception ex)
                {
                    //failed to parse int
                }
            }

            events.EventList.Add(eventId); //if an empty item, insert -1
        }

        return events;
    }
}

Then you can reference the items whenever you want.

eventEntityList = GetEventEntityByCsvDataRow(csvDataRow);
eventEntitySixthElement = eventEntityList[5];
Andrew Grinder
  • 585
  • 5
  • 21
0

So your questions is how to access the 6th data element. It's not too hard if you have right data structure representing your csv.

Basically this csv document in abstract term can be described as IEnumerable<IEnumerable<String>>, or, maybe, IEnumerable<IEnumerable<int?>>. Having implemented csv parsing logic, you will access the 6th elements by executin:

var csvRepresenation = ParseCsv(@"D:/file.csv");
var element = csvRepresentation.ElementAt(6);
if (element == "6")
{
    // do smth
}

With this aproach you will also be able to execute Linq statements on it. Now the question is how you will implement the ParseCsv():

public IEnumerable<IEnumerable<String>> ParseCsv(string path)
{
    return File.ReadAllLines(path).Select(row => row.Split(','));
}
Alex Sikilinda
  • 2,928
  • 18
  • 34