0

I have an Excel file from which I need to extract data at given intervals of time and angles. Easy taskwith SQL but I need to do it with pandas now. I can't figure ou if I can attach the actual file, but here are the names of the columns from the CSV file :

time;ms;interface;packet;GEN_Src;GEN_Dest;GEN_Class;GEN_Type;GEN_Length;GEN_cycle_cnt;lrt_data_type;corr_src;corr_acq_cnt;corr_fragment;corr_fragment_cnt;corr_acq_time;;;;;;;;;;;;;;;;;;;

There are actually dozens of columns without a name at the end, in which lie the data I need to extract. Slicing should work for this.

However, I need to separate them according to intervals of time (column 'time' ) and angles (column 'corr_src').

When I run :

df = pd.read_csv(r'path\EM_Cell_A-Sample1.csv', index_col='time')

I get :

pandas.errors.ParserError: Could not construct index. Requested to use 1 number of columns, but 1088 left to parse.

Previous attempts, driven by solutions found here : How to check if a column exists in Pandas indicate that the 'time' column does not exist.

Archerlite
  • 25
  • 4
  • What are you trying to do? Why `reset_index` at all? Indexes are per series (column) not rows. The default is just the row number. Resetting the index in this code doesn't do anything useful to begin with. `copy` itself is a waste of RAM and the result will be an actual copy of that highly fragmented dataframe. – Panagiotis Kanavos Jul 17 '23 at 07:18
  • `data from an Excel file` you're reading a text file. CSV isn't an Excel format, it's a text file with Values Separated by Commas. That's why `read_csv` works. Excel (xlsx) files are ZIP packages containing XML files in a standardized schema. – Panagiotis Kanavos Jul 17 '23 at 07:21
  • Thank you for the answer. I have edited my post. I also tried with read_excel and xlsx files and the errors are the same. The file was saved both as csv and xlsx. – Archerlite Jul 17 '23 at 07:30
  • Don't try things at random or use recipes that look vaguely related. Clean up the question because it's still unclear what you're trying to do. What you posted is the failed random attempts. Post an example of the file. `CSV` is text, period. Trying to read it like `xlsx` is *guaranteed* to fail. What does `time` contain, how did you try to turn it into an index? `.issuperset(['time'])` doesn't say anything. It's the wrong function to begin with. If you wanted to test for existence, `'time' in df.columns` – Panagiotis Kanavos Jul 17 '23 at 07:35
  • You can specify the column to use as index while reading the file itself with the `index_col` argument of [read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) : `df=pd.read_csv(path,index_col='time')`. To change the index to a different column or columns, use [set_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html) – Panagiotis Kanavos Jul 17 '23 at 07:37
  • I used set_index to try to turn the column 'time' into an index. It would not work as "column ['time'] could not be found" – Archerlite Jul 17 '23 at 08:06
  • The default delimiter is `,` not `;`. You can specify a different delimiter with `sep` or `delimiter`, eg `sep=';'` – Panagiotis Kanavos Jul 17 '23 at 08:22
  • If `;` is the field separator I bet `,` is used as the decimal separator. This means you also need to use `decima=','` – Panagiotis Kanavos Jul 17 '23 at 08:30

1 Answers1

0

Update

It seems the real problem is reading a CSV file that uses ; as the field separator. Half the world uses , as the decimal separator which means another character, typically ;, is used as the field separator

df=pd.read_csv(path,index_col='time',sep=';',decimal=',')

The question is unclear. There's no reason to create a copy of the original dataframe and reset_index() won't create a new index. The warning itself may not be relevant in Pandas 2.0.0

It seems that the real question is how to read a text file with comma separated values, using the time column as index. This can be done by specifying the index column in read_csv

df=pd.read_csv(path,index_col='time')

CSV isn't an Excel format, it's a text format containing Values Separated by Commas. Excel files (xlsx) are ZIP packages containing XML files. You can't read a text file with read_excel or an Excel file with read_csv.

Indexes are series, not rows. When no index is specified, the row number is used instead.

To check whether a value exists in a Python container use in. That includes Series or indexes and df.columns :

if 'time' in df.columns :
...

To change the index to a different one use set_index :

df.set_index(['interface','time']

reset_index does the opposite - it reverts the dataframe to the original index.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Following what you wrote, the following should not happen : `>>> print('time' in df.columns)` `False` – Archerlite Jul 17 '23 at 08:16
  • `;` isn't a comma (',')`. You can specify a different delimiter with `sep` or `delimiter`, eg `sep=';'` – Panagiotis Kanavos Jul 17 '23 at 08:22
  • I tried and it solved this problem. I can now assign the column time as the index. Thank you. I checked, and I have actuallyu asked excel to save the file with `,` as the separator and not `;` . I don't know where it went wrong. – Archerlite Jul 17 '23 at 08:52