I have text files containing tables which I want to put into a dataframe. Per file the column headers are the same, but the width is different depending on the content (because they contain names of different lengths for example).
So far I managed to get the index of the first character of the header, so I know where the column separation should be. As all files (about 100) have different widths I want to be able to insert this list in order to create a dataframe with the correct column width.
This is what the first two rows look like:
!Column1 !Column2 !Column3 !Column4 !Column5
Company with a $1,000,000 Yes Jack, Hank X
name
Company with. $2,000 No Rita, Hannah X
another name
What I tried so far:
(1)
pandas pd.read_fwf('file.txt', colspec(()) - This does work, but with colspec I have to put in the (from, start) indexes for each column. Not only would this be burdensome manually, but some files have 12 columns while others have 10.
(2)
pandas pd.read_fwf('file.txt', widhts(list)) - Here I can easily insert the list with locations of the column separations, but it does not seem to create a separation at those indexes. I do not exactly understand what is does.
Question:
I currently have a list of indexes of all the exclamation marks:
list = [0, 17, 30, 45, 58]
How can I use this list and separate the columns to convert the .txt file into a DataFrame?
Any other way to solve this issue is more than welcome!