0

I have this big data in csv file:

  1. I manage to open this on Jupyter Notebook.
  2. The data in csv example: 1 2 3 4 5 6 7 8 9 10
  3. And I wanted to open in the notebook as '3 windows rolling' without doing any (sum,mean for example)
  4. The output I want in the notebook are>>

Output dataset

furas
  • 134,197
  • 12
  • 106
  • 148
NJM
  • 1
  • 1
  • 2
    pandas DataFrame.shift(3) is this what you need? – zuijiang May 08 '21 at 05:55
  • as @jiangzui mentioned you can use `shift()`, First read csv to get first column, next use `shift` to get other columns `df["second"] = df["first].shift(-1)` `df["third"] = df["first].shift(-2)` – furas May 08 '21 at 07:09

1 Answers1

0

First open csv to get first column.

import pandas as pd

df = pd.read_csv("filename.csv")

I will use io only to simulate data from file

text = """first
1
2
3
4
5
6
7
8
9
10"""


import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))

Result

   first
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10

Next you can use shift to create other columns

df['second'] = df['first'].shift(-1)
df['third'] = df['first'].shift(-2)

Result

   first  second  third
0      1     2.0    3.0
1      2     3.0    4.0
2      3     4.0    5.0
3      4     5.0    6.0
4      5     6.0    7.0
5      6     7.0    8.0
6      7     8.0    9.0
7      8     9.0   10.0
8      9    10.0    NaN
9     10     NaN    NaN

At the end you can remove two last rows with NaN and convert all to integer

df = df[:-2].astype(int)

or if you don't have NaN in other places

df = df.dropna().astype(int)

Result:

   first  second  third
0      1       2      3
1      2       3      4
2      3       4      5
3      4       5      6
4      5       6      7
5      6       7      8
6      7       8      9
7      8       9     10

Minimal working code

text = """first
1
2
3
4
5
6
7
8
9
10"""

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['first'])
print(df)

df['second'] = df['first'].shift(-1) #, fill_value=0)
df['third'] = df['first'].shift(-2)
print(df)

#df = df.dropna().astype(int)
df = df[:-2].astype(int)
print(df)

EDIT:

The same using for-loop to create any number of columns

text = """col 1
1
2
3
4
5
6
7
8
9
10"""

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['col 1'])
print(df)

number = 5

for x in range(1, number+1):
    df[f'col {x+1}'] = df['col 1'].shift(-x)
print(df)

#df = df.dropna().astype(int)
df = df[:-number].astype(int)
print(df)

Result

   col 1
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10

   col 1  col 2  col 3  col 4  col 5  col 6
0      1    2.0    3.0    4.0    5.0    6.0
1      2    3.0    4.0    5.0    6.0    7.0
2      3    4.0    5.0    6.0    7.0    8.0
3      4    5.0    6.0    7.0    8.0    9.0
4      5    6.0    7.0    8.0    9.0   10.0
5      6    7.0    8.0    9.0   10.0    NaN
6      7    8.0    9.0   10.0    NaN    NaN
7      8    9.0   10.0    NaN    NaN    NaN
8      9   10.0    NaN    NaN    NaN    NaN
9     10    NaN    NaN    NaN    NaN    NaN

   col 1  col 2  col 3  col 4  col 5  col 6
0      1      2      3      4      5      6
1      2      3      4      5      6      7
2      3      4      5      6      7      8
3      4      5      6      7      8      9
4      5      6      7      8      9     10
furas
  • 134,197
  • 12
  • 106
  • 148
  • However, what if I have the CSV file value for example: if the value of 1 2 3 4 5 6 7 8 9 10, is a random number? how should I execute it? @furas – NJM May 08 '21 at 08:27
  • As I wrote in answer - you should read it from CSV - I used `range()` only to simulate some data. – furas May 08 '21 at 08:44
  • now I used `read_csv()` with `io.StingIO()` to simulate some data - but you should use `read_csv(filename)` – furas May 08 '21 at 08:52
  • Should I use sliding window or rolling if my value from the previous csv file is: 1 2 3 4 5 6 7 8 9 10 and changes to >> for example 0.1 0.98 0.88 0.976? – NJM May 08 '21 at 09:31
  • it doesn't matter what values you have. But I saw your screenshot which shows values in row - so question is if you want to duplicate column or row. I assumed that you have values in column and you want to create new columns. But this method will not work if you want to create new rows. – furas May 08 '21 at 09:37
  • The column as you can see from my screenshot was 50, 60, 70.. this can be ignored actually, but if I have a value of 0.1 0.98 0.88 0.976 instead of 1 2 3 4, can it be done as what you answer previously? because the 1 2 3 4... is in sequence and it can use the range function, while the data I have in a row is different value? – NJM May 08 '21 at 09:51
  • as I said before - it doesn't matter what values you have. I used range only to generate some data for example - I could use `random` for this or put some numbers from my head (even `0.1 0.98 0.88 0.976`) but `range` was simpler to use. But now my answer use `read_csv` and it doesn't matter if you have file with `1,2,3` or `0.1 0.98 0.88 0.976` or even with strings `"hello, world, of, python`. Simply try to use it instead of asking if it will works with other values. – furas May 08 '21 at 10:15
  • Frankly, if you would put values `0.1 0.98 0.88 0.976` as text in your question then I would use them - I would simply copy-paste them - but you put as example `1 2 3` so I used `1 2 3` as example. Next time put some real values in question and then people will use this values in code. – furas May 08 '21 at 10:21
  • Thank you furas for the answer! My bad that I use the example in the wrong way :( – NJM May 08 '21 at 10:31
  • Hi furas i cant drop the last 2 rows with my existing data that contains float dataframes – NJM May 10 '21 at 03:08
  • it doesn't matter what values you have - it should be `df = df[:-2]` – furas May 10 '21 at 04:11
  • Hi furas, I wanted to include in column 4 either "0" or "1" if the last row of column 1 is less than 0.85 in my data frames. then my last 6 rows will be at "1" at column 4, and the rest of it will be 0. Whereby if my last row is more than 0.85, the rest of column 4 will at "0". – NJM May 15 '21 at 02:06
  • create new question on new page and you will have more space for descripton and example - before and after. I don't understand your problem. – furas May 15 '21 at 02:24
  • Ok I already posted a new one – NJM May 15 '21 at 10:12