-1

I have the output of a code library which generates energies of "bands" in materials. The data is organised with the lefthand column representing an index and the second as its energy. When I plot it in GNUplot I get this:

GNUplot output of the data

The data is organised in the text file as follows:

0   -3.2101962802476773
0   -3.2101962802476773
0   -3.2101962802476773
0   -2.8612484511071283
0   -2.8612484511071212
0   -2.855472070340414
0   -2.855472070340414
0   -2.8473558653791424
1   -3.2098593700677056
1   -3.2098593700677056
1   -2.871177955425835
1   -2.871177955425834
1   -2.8651192493631106
1   -2.865119249363109
1   -2.846669223652509
1   -2.846669223652504
2   -3.209297896654713
2   -3.209297896654713
2   -2.8811028573856685
2   -2.881102857385668
2   -2.8750382428650094
2   -2.875038242865009
2   -2.8460674384836837
2   -2.846067438483675   

Where I can (somewhat) see the structure clearly, though the colours make it difficult. What I'd like to do using python & matplotlib is match up the first row of data to the first instance where the value of the left column has changed (so map the first (x,y) pair to the corresponding row where x is now 1), and the same for second, third, fourth etc. Then plot these collections of x and y onto the same graph. Is there a good programmatic way to do this? Would there then also be a good way of ensuring each line (there will be 100) is a different colour/distinguishable from one another?

I thought of doing a loop that packs the first of every value tuple into a key-value pair in a dict and then plot all these dicts on one graph, but not entirely sure on the syntax for this.

Thank you so much!

Edit

Add code given by OP in comment:

with open('BANDS.OUT', 'r') as f:
    lines = f.readlines()
    x = [float(line.split()[0]) for line in lines]
    y = [float(line.split()[1]) for line in lines]
ndclt
  • 2,590
  • 2
  • 12
  • 26
  • "match up the first row of data to the first row (match a row... to the same row?) where the value of that column (which column?!) has changed (changed compared to what?)" - could you please clarify this? – ForceBru Aug 15 '19 at 12:17
  • I've added some edits to show the data so it's a bit more clear. Basically I want the first row (0 -3.2101962802476773) to go into a dict with the first row where x has changed (1 -3.2098593700677056) and so on, and then repeat that for the second row starting at 0, and continuing on. Does that make sense? Hard to explain over text! – GappedState Aug 15 '19 at 12:29
  • Okay, so what code have you written to do that and where exactly are you stuck? – ForceBru Aug 15 '19 at 12:33
  • So far I have `with open('BANDS.OUT', 'r') as f: lines = f.readlines() x = [float(line.split()[0]) for line in lines] y = [float(line.split()[1]) for line in lines]` I'm stuck on writing the loop that indexes each row of x and y that matches the correct values to each other. (sorry for the formatting) – GappedState Aug 15 '19 at 12:41
  • Why don't you use pandas to read this file? `pd.read_csv('path/to/file.txt', sep='\t')` See this [answer](https://stackoverflow.com/a/20556913) for a quick plot. – ndclt Aug 15 '19 at 13:40
  • @ndclt I tried that originally but it reads both columns as a single one for some reason. Possibly because the columns are separated by 3 spaces. – GappedState Aug 15 '19 at 14:06
  • Actually got it loading in Pandas now. But still not sure how I'll associate the right rows with each other – GappedState Aug 15 '19 at 14:09
  • After reading with `read_csv`, you can do this: `expression = re.compile(r'\s+')` and `df1 = ( pd.read_csv('/path/to/filename') .iloc[:, 0] .str.replace(expression, '\t') .str.split('\t', expand=True) .drop([2], axis='columns') .rename({0: 'integer', 1: 'float'}, axis='columns') # better name to be chosen .assign(integer=lambda x: pd.to_numeric(x['integer']), float=lambda x: pd.to_numeric(x['float'])) )` – ndclt Aug 15 '19 at 14:34
  • What will that do? I get errors relating to [2] not being found in axis. Using a custom separator I was able to load the data into pandas however. Now I want to associate the first row to every Nth row after that, the 2nd row to every Nth row after that etc. and plot them together. I imagine I'd be using some sort of apply here? – GappedState Aug 15 '19 at 14:43

1 Answers1

0

try this after loading the dataframe as df (I hope I understood your rearrangement correctly):

df.columns=['x','y']
df_new=pd.DataFrame()

lendf = len(df['x'].drop_duplicates())


for i in range(0, lendf): 
    df_new = pd.concat([df_new,df[df['x'] == i].reset_index(drop = True)], axis=1,ignore_index=True)
    df_new.drop(df_new.columns[i],axis=1, inplace=True)

df_new.columns = [i for i in range(0, lendf)]

you can then transpose the df_new and plot the data.

jAguesses
  • 91
  • 5