0

I am trying to build a python script to parse a huge amount of data. I will be generating data from an existing tool which will be parsed by python tool and put into an excel sheet. I haven't yet figured out how the input data has to be. Is there any particular format or patterns anyone would suggest to make parsing easier? The approach in my mind is to use regular expressions and find places in junk data to identify blocks and such.

Is there any standard or format - anything of that sort which will improve the parsing as regular expressions can only be relied on assuming the format of inut data wont change

I believe regex is a bad idea as its error prone. This is why i am seeking other options. Here, i have the option to format or modify raw data also unlike usual scenarios. So, i would like to know all possible ways to make the report generation easier.

athultuttu
  • 192
  • 3
  • 15
  • 1
    you mean standards, like, csv, json, xml, yaml or html? In some cases regexes might be a [bad idea](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Quickbeam2k1 Aug 24 '17 at 12:24
  • the thing, the raw data is coming from a system where we cannot populate it in excel sheet or specific formats like json and all. csv, xml would be possible. I need to know if i formt the data in csv or xml will it help in parsing and populating excel sheet? – athultuttu Aug 24 '17 at 12:30
  • i also regex is a bad idea as its error prone. This is why i am seeking other options. Here, i have the option to format or modify raw data also unlike usual scenarios. So, i would like to know all possible ways to make the report generation easier. – athultuttu Aug 24 '17 at 12:34
  • 1
    So you are asking a question a bout data you want to present somewhere without explaining what you want to present and essentially you don't know how the data will look like? What an answer do you expect? As mentioned Regex parsing won't work with xml, but xml tags might provide structure you could use in presenting your data. – Quickbeam2k1 Aug 24 '17 at 12:34
  • the data would be a table with a counts against each row,column. ` Data for XX1 AAAA BBBB CCCC DDDD 1 2 3 3 Data for XX2 AAAA BBBB CCCC DDDD 2 3 2 2` this is a simpler visualization. – athultuttu Aug 24 '17 at 12:36
  • I want to know if there are any predefined packages or plugins which would immediately parse a format of data to excel sheet. like excel would read csv format. Only thing is complex standars of formats like JSON cannot be supported. We can add symbols or text characters in data like a comma in csv. – athultuttu Aug 24 '17 at 12:40
  • 1
    software recommendation is not desired on stackoverlow! You might want to check [softwarerecs](https://softwarerecs.stackexchange.com/). Besides you might want to check the pandas library – Quickbeam2k1 Aug 24 '17 at 12:42
  • I was looking for python packages. but anything is appreciated. I will have a look at pandas. – athultuttu Aug 24 '17 at 12:44
  • 1
    pandas is a python package – Quickbeam2k1 Aug 24 '17 at 12:48

1 Answers1

0

Python's standard library includes the csv module, which contains a Reader class and defined Excel-like dialect for parsing CSV data. The csv Reader classes will give you a generator of lists or dicts (if there is a header row), which you could then reroute to one of the Python-Excel integration libraries - these would all be third-party, likely open source, but not included in the standard lib. This is about as turnkey as you will find.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130