Setting condition on a column values while plotting other columns

Question

I have a question about plotting from a text data file that contains three columns (20000 rows). I would like to plot column 2 and 3 (or Histogram of column 2). However, I would like my plot to be for only a range of datas from column one from 100-250 values.

Note: One way maybe by sorting the data accoriding to column one, which I dont know how.

The sample of data is

174.2227   0.1624629285511385E+03  -0.6292327918805374E+02
 96.5364   0.9382981565234142E+02  -0.2269888520085278E+02
170.4995   0.1255471456652923E+03  -0.1153603193263530E+03
 70.3605   0.5622579821326531E+02  -0.4229968593987883E+02
 70.3641   0.1705414793985607E+02  -0.6826609764576108E+02
245.6546   0.1009630870343540E+03  -0.2239478772161106E+03
247.0803   0.2428952541481390E+03  -0.4528334882548071E+02
240.4885   0.1898105937624483E+03  -0.1476708453344265E+03
190.4206   0.2201049326187159E+01  -0.1904078537576801E+03
 58.0858   0.2315296872737939E+02  -0.5327192955482575E+02
263.4021   0.2480699465562589E+03  -0.8855483744759709E+02
 52.9697   0.1776581942067039E+02  -0.4990154780891378E+02
135.9583   0.1774572342000289E+02  -0.1347952056648868E+03
 79.8317   0.5762263417747670E+02  -0.5525152449053701E+02
155.5004   0.1506111928119825E+03  -0.3868642911295389E+02

I have tried the following code


import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np


file1 = "data1.txt"

a1 = np.loadtxt(file1,usecols=[0])
b1 = np.loadtxt(file1,usecols=[1])
c1 = np.loadtxt(file1,usecols=[2])

while 100 < a1 < 200:

   plt.plot(b1,c1,'k.')

plt.show()

score 0 · Answer 1 · answered Dec 08 '22 at 10:43

0

Use pandas to read your file and filtering.

import pandas as pd
df = pd.read_csv("your_file.tsv", sep='\t', names=['col1', 'col2', 'col3'])
df_filtered = df[df['col1'].between(100, 250)]
df_filtered.plot( x='col1', y='col2', kind='hist')

answered Dec 08 '22 at 10:43

cucurbit

1,422
1
13
32

No problem. Please consider marking the answer as accepted or upvoting it if it helped :). – cucurbit Dec 08 '22 at 14:08
actually, it never worked. sorry – jord Dec 08 '22 at 14:21

willwrighteng · Answer 2 · 2022-12-10T06:22:15.527

your while condition kept throwing errors. I would use pandas (like @cucurbit said)

Data

───────┬───────────────────────────────────────────────────────────
       │ File: data1.txt
───────┼───────────────────────────────────────────────────────────
   1   │ 174.2227   0.1624629285511385E+03  -0.6292327918805374E+02
   2   │  96.5364   0.9382981565234142E+02  -0.2269888520085278E+02
   3   │ 170.4995   0.1255471456652923E+03  -0.1153603193263530E+03
   4   │  70.3605   0.5622579821326531E+02  -0.4229968593987883E+02
   5   │  70.3641   0.1705414793985607E+02  -0.6826609764576108E+02
   6   │ 245.6546   0.1009630870343540E+03  -0.2239478772161106E+03
   7   │ 247.0803   0.2428952541481390E+03  -0.4528334882548071E+02
   8   │ 240.4885   0.1898105937624483E+03  -0.1476708453344265E+03
   9   │ 190.4206   0.2201049326187159E+01  -0.1904078537576801E+03
  10   │  58.0858   0.2315296872737939E+02  -0.5327192955482575E+02
  11   │ 263.4021   0.2480699465562589E+03  -0.8855483744759709E+02
  12   │  52.9697   0.1776581942067039E+02  -0.4990154780891378E+02
  13   │ 135.9583   0.1774572342000289E+02  -0.1347952056648868E+03
  14   │  79.8317   0.5762263417747670E+02  -0.5525152449053701E+02
  15   │ 155.5004   0.1506111928119825E+03  -0.3868642911295389E+02

Code

import pandas as pd
import matplotlib.pyplot as plt


def get_data():
    # get data as raw string
    filename = 'data1.txt'
    with open(filename, 'r') as file:
        tmp = file.read()
    
    # split string into rows and elements
    rows = tmp.split('\n')
    data_matrix = [row.split('  ') for row in rows]
    
    # convert to dataframe
    df = pd.DataFrame(data_matrix)
    df.columns = ['a1','b1','c1']
    for col in df.columns:
        df[col] = df[col].astype(float)
    return df

def write_to_csv(df,filename):
    df.to_csv(filename, index=False)

def read_from_csv(filename):
    df = pd.read_csv(filename)
    return df

def plot_hist(df):
    tmp = df.loc[df.a1.apply(lambda x: x > 100 and x < 200)]
    plt.hist(tmp.b1)
    plt.savefig('results.png')

def main():
    df = get_data()
    filename = 'intermediate-file.csv'
    write_to_csv(df,filename)
    df = read_from_csv(filename)
    plot_hist(df)
    
main()

Edit

^added read/write to csv above
Here is another SO post with details on adding a custom delimiter to pd.read_csv() so that it can load your text file
You can specify the data you'd like to plot in the plt.hist() method

Thank you very much for your answer. I have two questions, please. 1- how can I add reading from a text or CSV file to your script? 2- can I do a Histogram for only the 2nd column b1 not to include c1 as in your Histogram? thanks — jord, Dec 08 '22 at 12:54
@jord please accept the answer if it satisfies your question. thanks! — willwrighteng, Dec 09 '22 at 01:55
It works, but I am trying to read only the main file, i.e. removing that x and bringing all data from the file. I cannot do that. — jord, Dec 09 '22 at 12:18

gboffi · Answer 3 · 2022-12-10T10:15:51.887

Look Ma', no Pandas.

indices = np.logical_and(a1min<a1, a1<a1max)
b2, c2 = b1[indices], c1[indices]
# do what you want to do with the filtered data

When you address a Numpy array's axis using a corresponding array (same length as the length of the axis) of boolean values, you filter the array along that axis, the only remaining elements corresponding to the truth values.

To have the boolean array corresponding to a single condition is simple, e.g., the value of a1<a1max is an array where we have False where a1[j]>=a1max, and True when the condition is satisfied.

But if you try a1min<a1<a1max you'll have an exception!

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

to avoid the ValueError you have to combine the two (two or more, in reality) single conditions using a logical AND, an operation that Numpy allows by the use of the numpy.logical_and method, as shown at the beginnining.

Two lines of code, no need to import a (very) large module.

Setting condition on a column values while plotting other columns

3 Answers3

Data

Code

Edit