I'm doing a project that involves creating a rdbms of US federal code in a certain format. I've obtained the whole code form official source which is not structured well. I have managed to scrape the US Code in the below format into text files using some code on GITHUB.
Can this be done using a Python script to write this to some csv or flat file in the below format?
I'm new to Python but I'm told that this can easily be done using Python.
End output would be a flat file or a csv file with the below schema:
Example:
**Title | Text | Chapter | text | Section | Text | Section text**
1 | GENERAL PROVISIONS | 1 | RULES OF CONSTRUCTION | 2 | "County" as including "parish", and so forth | The word "county" includes a parish, or any other equivalent subdivision of a State or Territory of the United States.
Input would be a text file with data that looks like below.
Sample data:
-CITE-
1 USC Sec. 2 01/15/2013
-EXPCITE-
TITLE 1 - GENERAL PROVISIONS
CHAPTER 1 - RULES OF CONSTRUCTION
-HEAD-
Sec. 2. "County" as including "parish", and so forth
-STATUTE-
The word "county" includes a parish, or any other equivalent
subdivision of a State or Territory of the United States.
-SOURCE-
(July 30, 1947, ch. 388, 61 Stat. 633.)
-End-
-CITE-
1 USC Sec. 3 01/15/2013
-EXPCITE-
TITLE 1 - GENERAL PROVISIONS
CHAPTER 1 - RULES OF CONSTRUCTION
-HEAD-
Sec. 3. "Vessel" as including all means of water transportation
-STATUTE-
The word "vessel" includes every description of watercraft or
other artificial contrivance used, or capable of being used, as a
means of transportation on water.
-SOURCE-
(July 30, 1947, ch. 388, 61 Stat. 633.)
-End-