0

This is the question continous from my previous question. Thank to many people, I could modify my code as below.

import csv
with open("SURFACE2", "rb") as infile, open("output.txt", "wb") as outfile:
    reader = csv.reader(infile, delimiter=" ")
    writer = csv.writer(outfile, delimiter=" ")
    for row in reader:
        row[18] = "999"                  

        writer.writerow(row)

I just change delimiter from "\t" to " ". Whiel with previous delimiter, the code only worked upto row[0], with " " the code can work until row[18].

15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000   

From the data line above, row[18] is just in the middle between 15.20000 and 120.60000.

I am not sure what happens in between these two values. Maybe delimiter changes? However visually I can't notice any difference. Is there any way which I can know the delimiter changed and if so, do you have any idea to handle multiple delimiter for one code?

Any idea or help would be really appreciated.

Thank you, Isaac


The results from repr(next(infile)):

'            15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'
'            55.10000            -3.60000 03154      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                 16.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-888888.00000      0     16.00000      0    281.20001      0    279.89999      0      0.00000      0      0.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'

As you can see actually four first lines should be one line. For some reason, full line seems divided into 4 parts. Do you have any idea? Thank you, Isaac

Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Isaac
  • 885
  • 2
  • 15
  • 35
  • 1
    Can you clarify what it means when you say "the code can now work until row[18]"? – Andrew Magee Feb 27 '15 at 03:50
  • 1
    I don't understand your question - what is the problem you are facing? – Burhan Khalid Feb 27 '15 at 03:50
  • @AndrewMagee It means when I use row[18], this code works and no errors, but when I use row[19] it shows " list assignment index out of range" error. Thank you. – Isaac Feb 27 '15 at 03:55
  • 1
    Ok so maybe there are exactly 19 fields in the row (row[18] being the last one)? – Andrew Magee Feb 27 '15 at 03:56
  • @BurhanKhalid Sorry for confusion. I would like to change all the value in the column. So, row[18] means that I would like to change value in the column with 999. Upto row[18], this code works well, but when I would like to change 20th column with row[19], it shows "index out of range" error. Thank you. – Isaac Feb 27 '15 at 03:58
  • 1
    There must be a row that just doesn't have that many columns. In your loop you can say `print(len(row))` to see how many columns there are in each row. – Andrew Magee Feb 27 '15 at 04:00
  • @AndrewMagee That is the possible reason, but as you can see the data line above, this data line has very long length. Actually what I would like to change is almost about row[50]. – Isaac Feb 27 '15 at 04:01
  • 1
    Right but the elements in the list `row` are fields, not characters. – Andrew Magee Feb 27 '15 at 04:02
  • @AndrewMagee Thank you Andrew, I will follow your advice. – Isaac Feb 27 '15 at 04:03
  • @AndrewMagee as you recommended, I tried to see the len(row}. However, interesting thing is that len(row)s for each row are all different which should not be. Maybe csv.reader is not the correct tool, Would you please any other tool for replace? Thank you. – Isaac Feb 27 '15 at 04:17
  • What does the line that you are having trouble with look like when you print with `repr()`? The code that you posted will work only for fields that are delimited by a single space character. This means that fields like "get data information here" will result in multiple fields. Where does the CSV file come from? Is there a spec for it? – mhawke Feb 27 '15 at 04:57
  • @mhawke yes some lines are shorther not including all columns. This is not actually CSV file, but ascill file which is formatted in a certain form for the atmospheric observation.' – Isaac Feb 27 '15 at 05:29
  • @mhawke I put the "repr()" result below in the answer by myself, because the result, even the part of it, is too long. Would you please comment on it? – Isaac Feb 27 '15 at 05:41
  • @Isaac did the data originally have tabs in it? Should the line really be, eg, `"15.20000\t120.60000\t98327\tget data information here..."`? – Andrew Magee Feb 27 '15 at 12:04
  • @AndrewMagee: no, I don't think so. The data comes from a weather observation data format known as "little_r". It is not a CSV file, more like a fixed width field padded with spaces. – mhawke Feb 27 '15 at 12:34
  • Ah. Well if it's fixed width then the `csv` module is definitely inappropriate. What you want to do is plug the field widths into a pattern like this: http://stackoverflow.com/a/4914089/223486 – Andrew Magee Feb 27 '15 at 12:39

2 Answers2

2

N.B. The file format is discussed on page 19 of this document. This more-or-less agrees with the sample data.

EDIT

OK, after considering the various comments, additional answers, and reading the original question it would seem that the file in question is not a CSV file. It is weather observation data formatted as "little_r" which uses fixed width fields padded with spaces. There is not much info available so I'm guessing, but each group of 4 lines seem to comprise a single observation. From your previous question it seems that you want to update the 3rd column in the first line? The other 3 lines would be skipped. Then update the 3rd column in the first line of the next set of 4 lines, etc., etc.

An example from the OP:

            15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
      1      0      0

The first 2 columns of the first line are (I'm guessing) the latitude and longitude for the observations. I have no idea what the 3rd column 98327 is, but this is the column that the OP wants to update (based on previous question).

It's not a CSV file, so don't process it as one. Instead, because there are fixed width fields, we know the offset and width of the field that needs to be updated. Based on the sample data the 3rd column occupies characters 41-46. So, to update the data and write to a new file:

offset_col_3 = 41
length_col_3 = 5

with open('SURFACE2') as infile, open('output.txt', 'w') as outfile:
    for line_no, line in enumerate(infile):
        if line_no % 4 == 0:    # every 4th line starting with the first
            line = '{}{:>5}{}'.format(line[:offset_col_3], 999, line[offset_col_3+length_col_3:])
        outfile.write(line)

Original answer

Try reading line 20 (row[19]) (assuming no header line in the CSV file, otherwise line 21) from the file and inspecting it in Python:

with open("SURFACE2") as infile:
    for i in range(20):
        print repr(next(infile))

The last line displayed will be row 18. If, for example, tabs are delimiters then you might see \t in between the columns of data. Compare the previous line to the last line to see if there is a difference in the delimiter used.

If you find that your CSV file is mixing delimiters, then you might have to split the fields manually.

Community
  • 1
  • 1
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • 1
    He doesn't seem to be talking about row 18 but rather column 18 in a particular row. – Andrew Magee Feb 27 '15 at 04:05
  • 1
    @AndrewMagee. Oh, well if rows and columns are being confused then it's a difficult to know what is being asked. – mhawke Feb 27 '15 at 04:06
  • @mhawke Sorry for confusion, actually I would like to change the column not line or row. Thank you. – Isaac Feb 27 '15 at 04:08
  • 1
    @Isaac : no problem. This answer is still useful for you to inspect the data. – mhawke Feb 27 '15 at 04:43
  • 1
    Well since my post was deleted by the moderator I assume that means we do not need no one else around here. That is what I have been in the USMC for the past 10 years, where we can handle things like that as gentlemen in a secluded location. I do this in good faith but your answer is helpful mine is not. Sorry kid but the Tyranny of the Majority has spoken. I do not live from programming, that is why I can live for it, by pure interest, but that is not helpful around here, got it, enjoy. – Schopenhauer Feb 27 '15 at 20:59
  • 1
    I spent near 10 hours editing and adding information to that post, but that annoys people and take some valuable resources that we can spend on your answers. Again thank you assholes, enjoy. – Schopenhauer Feb 27 '15 at 21:02
  • @Schopenhauer I am sorry. I am not sure how your answer was deletted. I appreciate your concern and help. – Isaac Feb 27 '15 at 21:50
1

The csv module is not the right tool to use when you have fixed-width fields in your file. What you need to do is explicitly use the field lengths to split up the lines. For example:

# This would be your whole file
data = "\n".join([
    "abc  def gh i",
    "jk   lm  n  o",
    "p    q   r  s",
])
field_widths = [5, 4, 3, 1]

def fields(line, field_widths):
    pos = 0
    for length in field_widths:
        yield line[pos:pos + length].strip()
        pos += length

for line in data.split("\n"):
    print(list(fields(line, field_widths)))

will give you:

['abc', 'def', 'gh', 'i']
['jk', 'lm', 'n', 'o']
['p', 'q', 'r', 's']
Andrew Magee
  • 6,506
  • 4
  • 35
  • 58