I have a piped text file whose structure looks like this:
seatle|washington|13|35|1|NW|2005-05-10
sanfransisco|california|13|31|1|W|2005-05-10
chicago|illinois|10|33|1|C|2005-05-10
newyorkcity|newyork|9|32|2|E|2005-05-10
DC|DC|9|30|4|E|2005-05-10
miami|florida|9|20|4|SE|2005-05-10
dallas|texas|12|22|2|S|2005-05-10
In the above sample, the 5th
column is a category number. There can be n
number of categories (in above sample there are 3
categories - 1
,2
and 4
but the number can vary) in the dataset. Now what I want is to read the data from this input file
pivot the column 3
and 4
based on category in column 5
and create another output file
whose structure should look like this:
column 1|column 2|column 6|column 7|column3_catgory1|column3_catgory2|column3_catgory3|....|column4_catgory1|column4_catgory2|column4_catgory3|...
Thus the output file should have data from the input file column 1
,column 2
,column 6
,column 7
and then data of column 3
mentioned in the respective output column based on the value of column 5
followed by data of column 4
mentioned in the respective output column based on the value of column 5
. Thus for the above sample input the output file should look like this:
column 1 |column 2 |column 6|column 7 |column3_1|column3_2|column3_4|column4_1|column4_2|column4_4
seatle |washington|NW |2005-05-10|13 | | |35 | |
sanfransisco|california|W |2005-05-10|13 | | |31 | |
chicago |illinois |C |2005-05-10|10 | | |33 | |
newyorkcity |newyork |E |2005-05-10| |9 | | |32 |
DC |DC |E |2005-05-10| | |9 | | |30
As you can see that since there are 3
categories so column 3
and 4
of input are pivoted into three different columns in the output with there value mentioned in the appropriate column based on the category.For example, for row with seatle
since category was 1
so the values of column 3
and column 4
are mentioned in column3_1
and column4_1
respectively with other column 3 category values(column3_2 & column3_4) and column 4 category values (column4_2 & column4_4) empty.
Now this is the code that I have written to read the piped file and to determine how many categories are there but I am unable to proceed ahead of this.
category = []
read = open('file1.txt', 'r')
for line in read:
fields = line.split('|')
if fields[4] not in category:
category.append(fields[4])
category = map(int, category) #converts the category value to int for sorting
category.sort()
Can anyone help me how to proceed ahead of this and pivot the values appropriately?
NOTE: even though there is no header
in the input file, I want to add the header in output file like I have shown in my above sample output structure. Also in the output I don't want any extra space between between two column values as I have shown in the sample output data for the sample input.I have deliberately added those spaces to make it easier to view what I want.