2

I'm new to Python on OSX and need to plot data from two columns within a .txt file. On windows I used the 'x[:,0]' function to denote columns though this seems to not work on Mac. I have tried the following:

f = open(os.path.expanduser("~/Desktop/a.txt.rtf"))

lines=f.readlines()

result=[]

for x in lines:
    result.append(x.split(' ')[0])

for y in lines:
    result.append(y.split(' ')[1]) 

f.close()

plt.plot(x,y)
plt.show()

But it says that the list index is out of range, even though the test file just reads:

1  2
3  4
5  6
7  8

How can that be? Please help!

After solving this I need to know the Mac alternative to the "skip_header =" function (as the file I want to use has the data I need starting 25 rows down...)

Thanks in advance, and sorry if these are easy queries but I just can't make it work :(

mrCarnivore
  • 4,638
  • 2
  • 12
  • 29
OGTW
  • 21
  • 1
  • 3
  • Thanks for your comment! The full error message is "list index out of range", referring to the line containing [1]. – OGTW Jan 24 '18 at 15:41
  • I know. Tracebacks contain more valuable information than the pure error message, like in which module the error happened or on which line. This is the reason, SO asks you to post the full traceback in the question. I suggest, you edit your question and add this information. – Mr. T Jan 24 '18 at 15:45
  • I can't make sense of the full traceback, so I'll attempt to learn the OSX approach from elsewhere. I do hope others first posts' on stack overflow are not met with such keyboard aggression. – OGTW Jan 24 '18 at 15:57
  • Whoa there OGTW. It can feel like that, but rather this is helpful instruction your getting re basic skills in posting questions and having them resolved. What you received from Piinthesky is helpful instruction in this instance not bullying or criticism. As newbies to here, we have all been through it. The request to put the full traceback information in the body of the question is standard. People read the question, not often the comments, so that is where the info needs to be. Be patient, hope you stick around. – Cam_Aust Jan 28 '18 at 06:28
  • Further comment. I like your question and you have composed and presented it well over all. Most new comers do not grasp the notion of code blocks, for example. Well done. – Cam_Aust Jan 28 '18 at 06:30
  • Hi Cam_Aust, I was referring to a previous comment by Piinthesky that he/she has since deleted. The comment suggesting to include the full traceback is one I have no issues with at all. Thanks for your comment, I will indeed be sticking around as this site is a massive help in learning. Thanks! – OGTW Jan 29 '18 at 09:52

1 Answers1

3

This is not a easy question at all. It is a very good question and many people face the same problem in their daily work. Your question will help others as well!

The error is because you are trying to read a so called Rich Text Format file (RTF). So, the real content of the file is not like what you see on screen, but coded strings.

Instead of

['1  2', '3  4',...]

f.readline() actually generate something like

['{\\rtf1\\adeflang1025\\ansi\\ansicpg1252\\uc1\\adeff31507\\deff0\\stshfdbch31505\\stshfloch31506\\stshfhich31506\\stshfbi31507\\...]

Therefore, when you try to index the splited line, you get index out of range error.

3 ideas to solve this problem. First you may consider to convert the RTF to plain text and read the text file with readline() as what you did. Or, you can read the RTF with some third party parser. Or, you can parse the RTF yourself with regular expression. Here are some useful links

convert RTF

parse RTF

Hope it is helpful.

Update

Though it is not very clear what you want to plot exactly, I guess what you really want is a scatter plot regarding the 1st and 2nd column in your data file. If that is true, you may need to modify a bit your code. Below is an example.

Assume your a.txt file (not rtf) has content

1  2
3  4
5  6
7  8

You can do this to plot a x y scatter plot with the 1st column as x 2nd column as y.

import matplotlib.pyplot as plt
f = open(os.path.expanduser("a.txt"))
lines = f.readlines()

x, y = [], []

for line in lines:
    x.append(line.split()[0])
    y.append(line.split()[1])

f.close()

print(x, y)

plt.plot(x,y)
plt.show()

Or with one-liner

f = open(os.path.expanduser("a.txt"))
lines = f.readlines()

x, y = zip(*(line.split() for line in lines))

f.close()

print(x, y)

plt.plot(x,y)
plt.show()
englealuze
  • 1,445
  • 12
  • 19
  • Thank you for your response, and for your reassurance that this matter is not so trivial! I converted the file to plain text and I have a new error message: Unrecognized character 7 in format string. I am yet to Google the error, but any guidance will be appreciated. Thanks. – OGTW Jan 24 '18 at 16:13
  • I guess this might be due the error in decoding. Try to test with a different converter to see if there are any changes... – englealuze Jan 24 '18 at 16:30
  • Hi englealuze, I tried a different converter and the same error was returned. I'm astounded that a simple task such as this is presently so challenging... – OGTW Jan 29 '18 at 09:56
  • Hi @OGTW, then it is not because of decoding, and another evidence is that in the error message you get it seems it is correctly decoded to 7 - "Unrecognized character 7...". Then I think it might be the plt.plot(x,y) method doing something you are not expecting. What is this module and method exactly? – englealuze Jan 29 '18 at 11:12
  • @OGTW and you may want to change a bit your code, see my update – englealuze Jan 29 '18 at 12:11
  • Thank you englealuze, it now runs perfectly! The simple text file was just for 'stepping stones' purposes, I have a file with many 100s of columns with which I need to plot a variety of graphs. I just need the code to skip over a certain number of rowsin the .txt file before reading data: with numpy this is simply achieved with "skip_header = #", but I can't discover how this is achieved on os.path. Any final help will be much appreciated! – OGTW Jan 31 '18 at 16:09
  • @OGTW readlines() method return a list of lines, so if you know the number of lines you want to skip at beginning, you can simply slice the list as lines_after_5thline = f.readlines()[5:] This will give you lines from 6th line, and skip the first 5 lines – englealuze Jan 31 '18 at 16:55