0

I mentioned the details of what I am trying to plot. My main question has been posted in the last para.

I tried to plot a graph showing time on the x-axis and value on the y-axis. I have six columns in my text file where time is in the 3rd column and value (height or elevation in cm) is in the last column.

import pandas as pd
import matplotlib.pyplot as plt

df=pd.read_csv('Area(4).txt',delimiter='\t')
print(df.columns)

The file consists of elevations recorded for a month and it includes some unnecessary/missing values which I have filtered below. I want to plot the recordings between 26-30 April. So I did the following

df.iloc[[600]]
df.iloc[[695]]

df16=df.iloc[601:696,:]  # this takes the values between 26-29 April
print(df16)

df17=df16[df16.Value!=9999] #filtering of values I don't need
print(df17)

After doing all this, the output I require later looks like this:

     No    Date              Time    Rand  Col   Value
601  2762  26  4 1991         1:00   231    2    335
603  2764  26  4 1991         3:00   231    4    255
606  2767  26  4 1991         6:00   231    7    185
608  2769  26  4 1991         8:00   231    9    135
609  2770  26  4 1991         9:00   231   10    117
610  2771  26  4 1991        10:00   231   11    125
612  2773  26  4 1991        12:00   232    1    301
613  2774  26  4 1991        13:00   232    2    350
614  2775  26  4 1991        14:00   232    3    370
616  2777  26  4 1991        16:00   232    5    275
618  2779  26  4 1991        18:00   232    7    200
620  2781  26  4 1991        20:00   232    9    140
621  2782  26  4 1991        21:00   232   10    115
622  2783  26  4 1991        22:00   232   11    125
624  2785  27  4 1991         0:00   233    1    315
625  2786  27  4 1991         1:00   233    2    377
627  2788  27  4 1991         3:00   233    4    285
630  2791  27  4 1991         6:00   233    7    210
632  2793  27  4 1991         8:00   233    9    155
633  2794  27  4 1991         9:00   233   10    130
634  2795  27  4 1991        10:00   233   11    113
636  2797  27  4 1991        12:00   234    1    285
638  2799  27  4 1991        14:00   234    3    390
640  2801  27  4 1991        16:00   234    5    325
642  2803  27  4 1991        18:00   234    7    225
644  2805  27  4 1991        20:00   234    9    161
646  2807  27  4 1991        22:00   234   11    115
647  2808  27  4 1991        23:00   234   12    131
648  2809  28  4 1991         0:00   235    1    275
649  2810  28  4 1991         1:00   235    2    390
650  2811  28  4 1991         2:00   235    3    370
651  2812  28  4 1991         3:00   235    4    335
654  2815  28  4 1991         6:00   235    7    255
656  2817  28  4 1991         8:00   235    9    175
658  2819  28  4 1991        10:00   235   11    121
659  2820  28  4 1991        11:00   235   12    125
660  2821  28  4 1991        12:00   236    1    330
662  2823  28  4 1991        14:00   236    3    425
663  2824  28  4 1991        15:00   236    4    422
664  2825  28  4 1991        16:00   236    5    375
666  2827  28  4 1991        18:00   236    7    255
668  2829  28  4 1991        20:00   236    9    175
670  2831  28  4 1991        22:00   236   11    118
671  2832  28  4 1991        23:00   236   12    107
672  2833  29  4 1991         0:00   237    1    245
673  2834  29  4 1991         1:00   237    2    380
674  2835  29  4 1991         2:00   237    3    415
675  2836  29  4 1991         3:00   237    4    375
678  2839  29  4 1991         6:00   237    7    265
680  2841  29  4 1991         8:00   237    9    190
682  2843  29  4 1991        10:00   237   11    122
683  2844  29  4 1991        11:00   237   12    105
684  2845  29  4 1991        12:00   238    1    180
686  2847  29  4 1991        14:00   238    3    415
687  2848  29  4 1991        15:00   238    4    460
688  2849  29  4 1991        16:00   238    5    425

But when I tried to plot a graph with 'Time' on the x-axis and 'Value/Elevation (cm)' on the y-axis, I don't get a plot at all. The way I tried to plot is probably too simple or wrong but how I can get the plot? I also want the graph to show that the x-coordinates increment by 6 hours and the y-coordinates to be spaced by 2m.

df17.plot(kind='scatter', x='Time', y='Value')
plt.show()
Kurapika
  • 41
  • 6
  • It traced back my recent call and I got an output like KeyError: 'Time in hours' – Kurapika Jul 08 '22 at 00:02
  • I think your "csv" file might have some incorrect column delimitation. I took a subset of your data above and built a test "csv" file. Making sure the column names were properly tab delimited and the entries in each row were properly tab delimited, I got a graph to display. When you printed out the columns, did you get output like the following: Index(['No', 'Date', 'Time', 'Rand', 'Col', 'Value'], dtype='object') ? – NoDakker Jul 08 '22 at 00:19
  • @IgnatiusReilly Yes, I have a column called "Time". I showed what my text file looks like in my question. It's the third column. – Kurapika Jul 08 '22 at 00:21
  • What's the output of `print(df17.columns.tolist())` – Ignatius Reilly Jul 08 '22 at 00:21
  • @IgnatiusReilly I get this output: ['No', 'Date', ' Time', 'Rand', 'Col', 'Value'] – Kurapika Jul 08 '22 at 00:22
  • @NoDakker Would it be possible for you to show you managed to get a graph? If I print the columns I have in my text file, I get this: Index(['No', 'Date', ' Time', 'Rand', 'Col', 'Value'], dtype='object') – Kurapika Jul 08 '22 at 00:24
  • 1
    KeyErrors in pandas occur when you call a column with a name that's not the "real" name. Typically because of trailing spaces. Just try this: create a new variable `cols = df17.columns.tolist()` and then plot with `df17.plot(kind='scatter', x=cols[2], y=cols[5])`. – Ignatius Reilly Jul 08 '22 at 00:37
  • @IgnatiusReilly Yes, I do get a scatter plot. However, if I understand correctly, I do not want to plot the first column/No as my x-axis. The first column does not mean anything in my dataset. I want to plot the 'Time' column in the x-axis. – Kurapika Jul 08 '22 at 00:38
  • @IgnatiusReilly Thank you for the KeyError explanation. I tried to use df17.plot(kind='scatter', x=cols[2], y=cols[5] and I get NameError: name 'cols' is not defined. – Kurapika Jul 08 '22 at 00:41
  • 1
    First assign `cols = df17.columns.tolist()` – Ignatius Reilly Jul 08 '22 at 00:43
  • @IgnatiusReilly Yes, thank you. I assigned cols and tried to plot again. However, this appears: ValueError: scatter requires x column to be numeric. Is this because the time is given as 1:00, 2:00, 3:00? Is there a way to bypass this issue? – Kurapika Jul 08 '22 at 01:03
  • A pandas graph is completed by replacing the spaces in the date column with slashes and combining it with the time column to form a graph. `df['Date'] = df['Date'].apply(lambda x: x.replace(' ','/'));df['DateTime'] = pd.to_datetime(df['Date'].str.cat(df['Time'], sep=' '));df.plot(kind='line',x='DateTime', y='Value')` – r-beginners Jul 08 '22 at 01:21
  • @IgnatiusReilly I decided to just do a regular plot and a not a scatter plot. My graph appears but it looks a bit disorganized. For example, my x-axis 'Time' is set like this: 0:00, 16:00, 9:00,1:00, 16:00, 8:00. Do you know how I can make the customizations so that x-coordinates are spaced by 6 hours and the y-coordinates are spaced by 2m in pandas? – Kurapika Jul 08 '22 at 01:24
  • @r-beginners I want to clarify that the Date column and Time column in my dataset are separate columns and I only want the 'Time' column for my graph. I still tried to run what you suggested to see the output but I got KeyError: 'DateTime' – Kurapika Jul 08 '22 at 01:45
  • My comment is that the date column contains spaces, so replace them with slashes to make it date format. I concatenate the time column with it to make it a DateTime column. I draw a graph in Pandas with that new column as the x-axis. If you need to adjust the time series further, it is easier to use matplotlib to handle it. – r-beginners Jul 08 '22 at 01:53

1 Answers1

1

I am responding to your request in the comments as to how I set up the initial data and then ran your program with a couple of tweaks to adjust to subset of data.

First off, I copied in about twenty-five rows of data from your sample above. I made sure that the column names were tab-delimited and that each data element in each row was tab-delimited. This is the data I placed into the file (I named it "Area.txt").

No  Date    Time    Rand    Col Value
2762    26  4 1991  1:00    231 2   335
2764    26  4 1991  3:00    231 4   255
2767    26  4 1991  6:00    231 7   185
2769    26  4 1991  8:00    231 9   135
2770    26  4 1991  9:00    231 10  117
2771    26  4 1991  10:00   231 11  125
2773    26  4 1991  12:00   232 1   301
2774    26  4 1991  13:00   232 2   350
2775    26  4 1991  14:00   232 3   370
2777    26  4 1991  16:00   232 5   275
2779    26  4 1991  18:00   232 7   200
2781    26  4 1991  20:00   232 9   140
2782    26  4 1991  21:00   232 10  115
2783    26  4 1991  22:00   232 11  125
2785    27  4 1991  0:00    233 1   315
2786    27  4 1991  1:00    233 2   377
2788    27  4 1991  3:00    233 4   285
2791    27  4 1991  6:00    233 7   210
2793    27  4 1991  8:00    233 9   155
2794    27  4 1991  9:00    233 10  130
2795    27  4 1991  10:00   233 11  113
2797    27  4 1991  12:00   234 1   285
2799    27  4 1991  14:00   234 3   390
2801    27  4 1991  16:00   234 5   325
2803    27  4 1991  18:00   234 7   225

Then utilizing your code sample above, I tweaked it to grab the data I placed into the test file (I called it "Area.txt").

    import pandas as pd
import matplotlib.pyplot as plt

df=pd.read_csv('Area.txt',delimiter='\t')
print(df.columns)

df16=df.iloc[0:24,:]  # this takes the values I entered into the "csv" file but should work the same as your range.
print(df16)

df17=df16[df16.Value!=9999] #filtering of values I don't need
print(df17)

df17.plot(kind='scatter', x='Time', y='Value')
plt.show()

That yielded the following graph.

Sample Graph

I guess the point in the terminal output that confirmed that I would generate a graph was the printing of the data frame columns after reading in the data from the "csv" file.

Index(['No', 'Date', 'Time', 'Rand', 'Col', 'Value'], dtype='object')

So again, because I had a relatively small subset of your data, I could ensure myself that the data was clean. I hope that helps you troubleshoot your project.

Regards.

NoDakker
  • 3,390
  • 1
  • 10
  • 11
  • Thank you, I still have some questions. (1) Could you explain to me the part about how you ensured that the column names were tab-delimited and that each data element in each row was tab-delimited? This is the first time I am hearing this so I'm not sure how I am supposed to check this aspect of my text file on my end. – Kurapika Jul 08 '22 at 01:40
  • (2) While I couldn't get a scatter plot, I managed to get a lineplot later on but it looks very messy and hard to understand/interpret. Do you know how I can plot the graph so that x-coordinates are evenly spaced over an interval of 6 hrs like it starts from 12am, then 6am, 12 pm, 6pm,etc and the y-coordinates is evenly spaced by 100 cm? – Kurapika Jul 08 '22 at 01:41
  • 1
    The reason I knew that my column names were tab-delimited was that I knew that I had used the tab key when entering in the data. I don't know what other apps you have but if you have a spreadsheet application that allows the import of csv file data, you might do a test import indicating that the tab key is the delimiter and see if the spreadsheet looks correct. Regarding comment #2, I am not well versed in that. However, I found this link [matplotlib spacing](https://stackoverflow.com/questions/52523710/matplotlib-increase-spacing-between-points-on-x-axis) that might help. – NoDakker Jul 08 '22 at 01:54